中文版|English Version

Outlook Interview Chen Xiaohong, Academician CAE and Director of Xiangjiang Lab: Build a hub for innovative applications of large-scale AI models

2023-12-05

On December 2, the Xinhua News Agency's client platform published an interview article with Chen Xiaohong, a member of the Chinese Academy of Engineering, Director of Xiangjiang Laboratory, and Secretary of CPC HUTB Committee, in the "Outlook" news magazine. The full text is as follows:

◇In the future, multimodal large models will further integrate and mutually promote technologies such as search engines, knowledge graphs, adversarial gaming, and brain cognition. They will evolve towards a more intelligent and versatile direction to address increasingly complex and diverse environments, scenarios, and tasks.

◇Large models based on deep neural networks, being black-box models, still have some blind spots and weaknesses in aspects such as the emergent capabilities and scale laws of large language models, knowledge representation, logical reasoning abilities, generalization, and scene learning capabilities of multimodal large models. Relevant technologies are still in need of continuous breakthroughs.

◇Actively advancing the construction of cloud computing platforms, establishing a new type of infrastructure centered around computing power networks, strengthening the algorithms with computing networks to provide robust computing resources and corresponding services.

Written by Zhang Yujie from Outlook Weekly

Currently, numerous technology companies are expanding their business landscape around the proliferation of large artificial intelligence (AI) models.

"Large models are exerting unprecedented influence.Looking at the trends, the development of large models can catalyze general artificial intelligence, leading the way for intelligent innovation across various industries." Chen Xiaohong, academician of the Chinese Academy of Engineering, Director of Xiangjiang Laboratory, and Secretary of CPC HUTB Committee, told the reporter from the "Outlook" news magazine.

After undergoing extensive training with large-scale data, large models can adapt to a variety of tasks, characterized by large parameter sizes, extensive training data requirements, and high computing power consumption. Despite rapid technological advancements, they still face constraints such as poor reliability, dependence on training data, weak causal reasoning abilities, and high deployment costs. Additionally, they encounter challenges in identifying suitable real-world applications.

In recent years, under the leadership of Chen Xiaohong, the team has achieved a series of original and systematic results in the field of advanced computing and artificial intelligence. They have also actively explored large model technologies. "We will promote development with a focus on building a solid foundation, encouraging applications, and ensuring security. We will adopt a systematic engineering approach for technological breakthroughs," said Chen Xiaohong.


At the Baidu World 2023 venue, attendees are experiencing Baidu's "ERNIE Bot" at the exhibition area. (Photographed on October 17, 2023).

Photo by Zhang Manzi/ Outlook

Grasping the development ecology and trends of AI large models

Outlook: What has been the developmental journey of AI large models?

Chen: In terms of development speed, the arrival of superintelligence is faster than we might have imagined. Currently, AI technology has entered the era of large models and has become the focus of global innovation.

Since the effective learning of neural networks in 2006, which provided a crucial optimization pathway, the research paradigm of AI technology based on deep learning has undergone a developmental journey from small data to big data, from small models to large models, and from specialized to general applications. By the end of 2022, the language model ChatGPT, supported by "large models + big data + high computing power", has acquired the capability to handle tasks across multiple scenarios, purposes, and disciplines. Such large model technologies find extensive applications in various fields such as economics and law, sparking a global trend in the development of large models.

The development of large model technology has undergone processes of architectural evolution, changes in training methods, and efficient model adaptation. Moreover, it is transitioning from single-modal language models to large models that integrate multiple modalities such as language, vision, and auditory inputs. Data indicates that in the past five years, there have been around 45 globally recognized large models with parameters exceeding tens of billions.

China has a solid foundation in the field of large models, with strong demand and a broad market. In recent years, domestically developed large models have seen accelerated development. Within six months, ERNIE Bot has iterated to version 4.0, with significant improvements in its understanding, generation, logic, and memory capabilities. Data shows that in China, there are at least 130 companies researching large model products. Among them, 78 companies focus on general large models, with over 10 models having parameters exceeding 10 billion. Additionally, there are nearly 80 large models with parameters exceeding 1 billion. China ranks in the top tier globally in terms of the number of large models.

Xiangjiang Laboratory, as a high-level scientific and technological innovation platform focusing on advanced computing and artificial intelligence, is currently concentrating its efforts on actively addressing challenges in the large model field. It aims to launch the industry-specific "Xuanyuan" large model as soon as possible, empowering high-quality development in industries such as smart transportation, intelligent manufacturing, smart healthcare, and the metaverse.

Outlook: How would you assess the development trends of large models in terms of technological innovation and practical applications?

Chen: Currently, the development path of large models, progressing from "infrastructure - underlying technology - basic universal - vertical applications", is becoming increasingly clear. The ecosystem of large model technology is flourishing, with the mainstream trend being open-source services and an open ecosystem. Thanks to domestic and international open platforms for large models, open-source models, frameworks, tools, and publicly available datasets, large model technology is undergoing rapid evolution.

In terms of large model service platforms, the main trend is towards opening up to individuals and extending to commercial applications. For example, users can access various deep learning models and complete downstream tasks through service platforms like the OpenAI API.

In the open-source ecosystem of large models, thanks to the effective support of open-source frameworks, the training of large-scale models is becoming increasingly mature. For instance, PaddlePaddle, as a domestic deep learning platform, integrates core deep learning frameworks, foundational model libraries, end-to-end development suites, and tool components. It plays a crucial role in distributed training for models in various fields such as natural language processing and computer vision.

The integration of large model technology with the real economy is accelerating, and its application scenarios are extremely diverse. For example, the combination of large models and education can make educational methods more intelligent and personalized; large models in healthcare can empower the entire diagnosis and treatment process in medical institutions; and large models in entertainment can enhance the fun and entertainment value through improved human-machine interaction, and so on. It is in the deep integration with various industries such as education, healthcare, and media arts that the capability boundaries of general large models continue to expand, bringing about changes in the production and lifestyle of human society.

In summary, the upstream development ecology of large models, coupled with software, hardware, and data resources, is showing strong momentum. The downstream application ecology of large models and various application scenarios is flourishing, and large models are accelerating to become a key support in the intelligent upgrading of all industries. In the future, multimodal large models will further integrate and mutually promote technologies such as search engines, knowledge graphs, adversarial gaming, and brain cognition. They will evolve towards a more intelligent and versatile direction to address increasingly complex and diverse environments, scenarios, and tasks. We need to seize important opportunities and expedite the pace of large models empowering various industries more effectively.

Understanding the risks and challenges in the development of AI large models

Outlook: What challenges does the current development of AI large models still face?

Chen: With the increasingly widespread deployment and application of large models, the associated risks and challenges cannot be ignored and require high attention.

Firstly, interpretability is still insufficient. Large models based on deep neural networks, being black-box models, still have some blind spots and weaknesses in aspects such as the emergent capabilities and scale laws of large language models, knowledge representation, logical reasoning abilities, generalization, and scene learning capabilities of multimodal large models. Relevant technologies are still in need of continuous breakthroughs.

Secondly, there is room for improvement in reliability assurance. Language models trained on massive datasets often exhibit numerous issues in terms of factual accuracy, timeliness, and cannot be reliably assessed for the synthesized content. Additionally, large models may absorb and reflect inappropriate, biased, or discriminatory content present in the data, leading to outputs that include hate speech, prejudice, discrimination, and misleading information.

Thirdly, there is an urgent need to address high deployment costs and insufficient transferability. Large models with massive parameter and data scales have substantial computational and power consumption requirements for training and inference, leading to high application costs. Moreover, there are issues with latency in on-device inference, limiting their practical application. Additionally, the effectiveness of large models relies on the scenarios covered by the training data, depending on the data scale, breadth, quality, and precision. Due to insufficient data for complex scenarios and inadequate precision, large models face challenges related to specific scenario applicability and generalization.

Fourthly, there is a need to strengthen security and privacy protection. Various attack methods, such as data poisoning attacks, model theft attacks, adversarial sample attacks, instruction attacks, and backdoor attacks, pose risks to the deployment of applications related to large models. Simultaneously, during the training process of large models, various types of sensitive and private data may be encoded into the model parameters, raising the possibility of privacy data leakage through prompted information.

Fifthly, there is a need to be vigilant about accompanying technological risks. When language models are combined with technologies such as speech synthesis, image and video generation, they can create multimedia content, including audio and video, that is difficult for humans to distinguish from reality. This capability may be misused to generate false information, maliciously guide behavior, provoke public opinion attacks, and pose a threat to national security.

Driving AI large models to better empower various industries

Outlook: How to promote the high-quality development of the economy and society through large model services?

Chen: Efforts should be urgently made to promote the research and development of large model technology, strengthen the vertical industry's data foundation, and concurrently enhance the supervision of the risks associated with large models. This will underscore the technical and social attributes of AI. There are the following specific recommendations:

Firstly, promote the self-reliance and controllability of the large model technology stack. The first step is to strengthen macro planning and top-level design, and formulate a development outline for large models. Enhance intellectual property layout in core aspects of large models and related technologies. Actively establish an industry development alliance comprising upstream and downstream enterprises in chip manufacturing, cloud computing, the internet, applications, etc. Support a collaborative model for large model research and development involving industry, academia, and research institutions. Encourage relevant enterprises to undertake digital transformation and upgrade based on large models. The second step is to strengthen original technical innovation in large models and the construction of a large model software and hardware ecosystem. Enhance the self-reliance and controllability of the foundational software required for large model development. Encourage enterprises and institutions to use domestic deep learning frameworks more when conducting large model training and inference. Guide domestic chip manufacturers to carry out adaptation and integrated optimization with large models based on domestic frameworks.

Secondly, address the issue of insufficient computing power during the large model training process. The first step is to support and promote research and innovation in distributed computing technology to enhance the scalability and efficiency of computing power. Facilitate the creation of computing clusters to generate larger-scale computing capabilities. The second step is to actively promote the construction of cloud computing platforms, creating a new type of infrastructure centered around computing power networks. Strengthen computing power with networks, providing robust computing resources and corresponding services. Implement incentive measures such as funding support, tax benefits, and intellectual property protection to encourage enterprises and research institutions to invest in and develop technologies and facilities related to large model computing power. This will establish a robust foundation to support the development of large models.

Thirdly, drive technological advancements to enhance the security of large models. The first step is to design a classification system for systems, research potential vulnerabilities in integrated applications of large models, and address attacks such as information gathering, intrusion, content manipulation, fraud, malicious software, and reduced service availability in a targeted manner. Develop security alignment and assessment technologies for large models, advance security enhancement technologies for large models, enhance the security of training data, optimize security alignment training algorithms, improve the robustness and anti-interference capabilities of AI, continually enhance transparency, interpretability, reliability, and controllability, gradually achieve auditability, supervision, traceability, and trustworthiness. The second step is to strengthen risk assessment and prevention for the development of AI large models. Implement inclusive and cautious regulation with classification and grading supervision for generative AI services, and enhance security supervision measures for large models. Integrate ethics and morals into the entire lifecycle of AI, establish an AI ethics governance standard system, ensuring that the design and training of models strictly adhere to ethical guidelines.

Fourthly, establish compliance standards and evaluation platforms for large models. The first step is to establish compliance standards and development guidelines for AI, comprehensively covering safety requirements in the development, training, and deployment processes of large models. Develop corresponding safety standards and guidelines, with a focus on formulating standards for AI safety terminology, AI safety reference frameworks, basic AI safety principles and requirements, etc. Construct a method system for assessing the capabilities of large models, including transparency and legality in data collection and use, privacy protection measures, and principles for handling sensitive topics and content, ensuring that the development and application of large models comply with ethical and legal requirements. The second step is to establish a scientifically effective evaluation platform, and formulate a set of standards and methodologies for evaluating large models in the context of the Chinese language. Clarify aspects of the evaluation process, such as data preparation, assessment metrics, testing methods, etc. Provide unified standards and leverage the platform to conduct diverse evaluation tasks tailored to different fields and applications. This approach ensures accurate assessments of the performance and effectiveness of various models.

Fifthly, establish a collaborative mechanism to drive the development of large models. The first step is to explore the establishment of a regular cooperation mechanism between the academic and business sectors, encourage universities and companies to set up joint research centers, laboratories, or collaborative projects, promote data sharing and collaborative research between the two, enhance the understanding and analysis of large model training data, help the academic community better understand the characteristics and potential risks of large models, and assist the business sector in further improving the security of algorithms and models. The second step is to promote the cultivation and exchange of outstanding talents by establishing joint Ph.D. training programs, allowing researchers to visit companies on-site, and other means to facilitate the training and exchange of top talents between universities and businesses.

Quick access
二维码 二维码