Natural language processing

broadcast Upload video
Language processing mode
unfoldThree entries with the same name
Collect
Check out my collection
0 Useful +1
0
This entry is reviewed by the "Science China" science encyclopedia entry compilation and application work project.
Natural Language Processing (NLP) is computer Scientific field and Artificial intelligence An important direction in the field. It's research that can realize people and computer Various theories and methods for effective communication in natural language. Natural language processing is a fusion philology , computer science , Maths integrated science . Therefore, research in this field will be involved Natural language That people use every day Language So it's related to philology There are close links between the two studies, but there are important differences. Natural language processing is not a general study of natural language, but the development of effective natural language communication Computer system Especially among them Software system . So it's part of computer science [1] .
Natural language processing is mainly used in machine translation, public opinion monitoring, automatic summarization, opinion extraction, text classification, question answering, text semantic comparison, speech recognition, Chinese OCR and so on [2] .
Chinese name
Natural language processing [1]
Foreign name
natural language processing [2]
Field of application
Computers, artificial intelligence
Abbreviated form
NLP [1]

intro

broadcast
EDITOR
The language is Humans Distinguishing the essential characteristics of other animals. Of all living things, only human beings have the ability of language. Human variety Brain power Are closely related to language. Human logical thinking takes the form of language, and most of human knowledge is also recorded and handed down in the form of language and writing. Therefore, it is also Artificial intelligence An important, even core part of it.
Use natural language with computer To communicate, which is what people have long sought. Because it has obvious practical significance as well as important theoretical significance: people can use the computer in the language they are most accustomed to, without having to spend a lot of time and energy to learn various computer languages that are not very natural and accustomed to; It can also be used to further understand the mechanism of human language ability and intelligence.
Natural language processing refers to the technology of interacting with machines by using the natural language used by humans to communicate. Natural language can be read and understood by computers through artificial processing. The related research of natural language processing began with human's exploration of machine translation. Although natural language processing involves multi-dimensional operations such as phonology, grammar, semantics and pragmatics, in short, the basic task of natural language processing is to divide the processed corpus into words based on ontology dictionaries, word frequency statistics and contextual semantic analysis, etc., so as to form semantic-rich lexical units with the smallest part of speech as the unit. [3]
Natural Language Processing (NLP) is a subject that uses computer technology to analyze, understand and process natural language, which takes computer as a powerful tool for language research. With the support of computer, the quantitative study of language information is carried out, and the language description is provided for common use between human and computer. It includes two parts: NaturalLanguage Understanding (NLU) and Natural LanguageGeneration (NLG). It is a typical interdisciplinary field that involves language science, computer science, mathematics, cognitive science, logic, etc., and focuses on the interaction between computers and human (natural) language. The process of computer processing of Natural Language is also called Natural Language Understanding (NLU) and Human language Technology (Human Language Technology) at different times or with different emphasis. HLT), Computational Linguistics Hl(Computational Linguistics), QuantitativeLinguistics (Computational Linguistics), Mathematical Linguistics [1] .
To realize natural language communication between human and machine means to make the computer not only understand the meaning of the natural language text, but also express the given intention and thought in the natural language text. The former is called Natural language understanding The latter is called Natural language generation . Therefore, natural language processing generally includes natural language understanding and natural language generation. In history, more research has been done on natural language understanding, but less on natural language generation. But that has changed.
Both the realization of natural language understanding and the generation of natural language are not so simple as people originally thought, but very difficult. From the current theoretical and technical status, a universal, high-quality natural language processing system is still a long-term goal, but for certain applications, practical systems with considerable natural language processing capabilities have appeared, some have been commercialized, and even began to industrialization. Typical examples are: multilingual databases and Expert system Natural language interface, various Machine translation System, full text Information retrieval System, Automatic summarization Systems, etc.
Natural language processing, that is, the realization of natural language communication between humans and machines, or the realization of natural language understanding and natural language generation, is very difficult. The root cause of this difficulty is the wide variety of ambiguities, or ambiguities, that exist at all levels of natural language text and dialogue.
There is a many-to-many relationship between the form (string) of a natural language and its meaning. In fact, this is the beauty of natural language. But from a computer processing point of view, we must disambiguate, and some argue that it is the central problem in natural language understanding, which is to convert potentially ambiguous natural language input into some kind of unambiguous internal representation of the computer.
The widespread existence of ambiguities requires a great deal of knowledge and reasoning to eliminate them, which brings great difficulties to linguic-based and knowledge-based approaches. Therefore, natural language processing research with these approaches as the mainstream has made a lot of achievements in theory and methods over the past decades, but in the development of systems capable of processing large-scale real texts, it is difficult to eliminate them. The results are not remarkable. Most of the systems developed are small-scale, research demonstration systems.
The current problems are twofold: On the one hand, grammar so far is limited to the analysis of an isolated sentence, and there is a lack of systematic research on the constraints and effects of context and conversational environment on this sentence. Therefore, there are no clear rules to follow for the analysis of ambiguity, word ellipsis, pronoun reference, and the different meanings of the same sentence in different occasions or by different people. It is necessary to strengthen the study of pragmatics to solve the problem step by step. On the other hand, people understand a sentence not only by grammar, but also by the use of a large amount of relevant knowledge, including life knowledge and specialized knowledge, which cannot all be stored in a computer. Therefore, a written comprehension system can only be built within a limited range of words, patterns and specific topics; After the storage capacity and operation speed of the computer are greatly increased, it is possible to properly expand the scope.
These problems become the key to natural language understanding Machine translation The main problems in the application, which is one of the reasons why the translation quality of today's machine translation system is still far from the ideal goal; The quality of translation is the key to the success of a machine translation system. Chinese mathematician and linguist Zhou Haizhong Professor once pointed out in the classic paper "Fifty years of machine translation" : to improve the quality of machine translation, the first thing to be solved is Language The problem itself is not routine Design problems; It is certainly impossible to improve the quality of machine translation by relying only on several programs. In addition, humans have not yet understood how the brain performs fuzzy recognition and recognition of language Logical judgment In this case, it is impossible for machine translation to achieve the degree of "faithfulness, elegance and elegance".

History of development

broadcast
EDITOR
One of the earliest research efforts in natural language understanding was machine translation [4] . In 1949, Weaver, an American, first proposed a machine translation design scheme [1] . Its development is mainly divided into three stages.
Early natural language processing
The first stage (1960s to 1980s): Rule-based vocabulary, syntactic semantic analysis, question answering, chat, and machine translation systems. The advantage is that rules can take advantage of human introspective knowledge, do not rely on data, and can get started quickly; The problem is lack of coverage, like a toy system, rule management and scalability have not been solved. [5]
Statistical natural language processing
Phase 2 (beginning in the 1990s) : Statistics-based machine learning (ML) became popular, and a lot of NLP began to be done using statistics-based methods. The main idea is to use labeled data to build a machine learning system based on manually defined features, and use the data to determine the parameters of the machine learning system after learning. The runtime uses these learned parameters to decode the input data and get the output. Machine translation and search engines have been successful using statistical methods. [5]
Neural network natural language processing
Phase 3 (after 2008) : Deep learning begins to exert its power in speech and images. As a result, NLP researchers began to turn their attention to deep learning. Use deep learning to compute features or build a new feature, and then experience the effects within the existing statistical learning framework. For example, search engines have added deep learning to calculate the similarity of search terms and documents to improve the relevance of searches. Since 2014, attempts have been made to model end-to-end training directly through deep learning. At present, progress has been made in the fields of machine translation, question answering, reading comprehension, and so on, and there is a boom in deep learning. [5]

Concept and technology

broadcast
EDITOR
Information Extraction (IE)
Information extraction is the process of extracting and converting unstructured information embedded in text into structured data, and extracting the relationship between named entities from the corpus composed of natural language, which is a deeper research based on named entity recognition. The main process of information extraction has three steps: firstly, the unstructured data is automatically processed, secondly, the text information is extracted, and finally, the extracted information is structured. The most basic work of information extraction is named entity recognition, and the core lies in the extraction of entity relations. [6]
Automatic summarization
Automatic summarization is a kind of information compression technology that uses computer to extract text information automatically according to a certain rule and assemble it into a short summary. It aims to achieve two goals: first, to make the language brief, and second, to preserve important information. [6]
Speech recognition technology
Speech recognition technology is to allow the machine to recognize and understand the speech signal into the corresponding text or command technology, that is, to let the machine understand human speech, its goal is to convert the vocabulary content of human speech into computer-readable data. To do this, we must first decompose continuous speech into words, phonemes and other units, and also establish a set of rules for understanding semantics. Speech recognition technology from the process of front-end noise reduction, speech cutting and framing, feature extraction, state matching several parts. The framework can be divided into three parts: acoustic model, language model and decoding. [7]
Transformer model
The Transformer model was first proposed by the Google team in 2017. Transformer is a model based on attention mechanism to accelerate deep learning algorithms. The model consists of a set of encoders and a set of decoders. The encoder is responsible for processing input of any length and generating its expression, and the decoder is responsible for converting the new expression into the target word. The Transformer model leverages the attention mechanism to take the relationships between all the other words and generate a new representation of each word. The advantage of Transformer is that its attention mechanism can capture the relationships between all the words in a sentence without considering their position. The traditional encoder-decoder model must be combined with the Convolutional Neural Networks (CNNS) convolutional neural networks (CNNS) and use the structure of full Attention instead of LSTM. Reduce computation and improve parallel efficiency without compromising the final experimental results. But this model has its drawbacks. Firstly, the calculation of this model is too large, and secondly, it has the problem that the location information is not used obviously, and it cannot capture the long-distance information. [8]
Natural language processing technology based on traditional machine learning
Natural language processing can classify processing tasks to form multiple subtasks. Traditional mechanical learning methods can use SVM (support vector machine model),
Markov (Markov model), CRF (conditional random field model) and other methods to process multiple subtasks in natural language, further improve the accuracy of processing results. However, from the practical application effect, there are still the following shortcomings: (1) The performance of the traditional machine learning training model is too dependent on the quality of the training set, and the training set needs to be manually labeled, which reduces the training efficiency. (2) The training set in the traditional machine learning model will have greatly different application effects in different fields, which weakens the applicability of training and exposes the drawbacks of a single learning method. Making a training data set applicable to multiple different domains requires a lot of human resources to manually annotate. (3) When processing higher-order and more abstract natural languages, machine learning cannot manually label these natural language features, so that traditional machine learning can only learn pre-established rules, but can not learn complex language features outside the rules. [9]
Natural language processing technology based on deep learning
Deep learning is a branch of machine learning. Deep learning models, such as convolutional neural networks and recurrent neural networks, need to be applied in natural language processing to complete the process of classification and understanding of natural language by learning generated word vectors. Compared with traditional machine learning, natural language processing technology based on deep learning has the following advantages: (1) Deep learning can constantly learn language features on the premise of word or sentence vectorization, master higher-level and more abstract language features, and meet the natural language processing requirements of a large number of feature engineering. (2) Deep learning does not require experts to manually define training sets, and can automatically learn high-level features through neural networks. [9]

Technical difficulty

broadcast
EDITOR
Effective definition of content
In daily life, the words between sentences usually do not exist in isolation, and all the words in the discourse need to be related to each other to express the corresponding meaning. Once a specific sentence is formed, the corresponding defining relationship between the words will be formed. Without effective definition, the content becomes ambiguous and cannot be effectively understood. For example, he secretly went out to play behind his mother and sister's back. If the preposition "and" is not defined in this sentence, it is easy to conclude that the mother and sister do not know that he is hanging out, or that the mother does not know that he is hanging out with his sister.
Disambiguation and ambiguity
The use of words and sentences in different situations often has multiple meanings, and it is easy to produce vague concepts or different ideas. For example, the word "mountains and rivers" has multiple meanings, which can not only express the natural environment, but also express the relationship between them. It even describes the beauty of the music, so natural language processing needs to define the content before and after, remove ambiguity and ambiguity, and express the real meaning. [10]
Defective or irregular input
For example, speech processing encounters foreign or regional accents, or processing spelling, grammar, or text Optical character recognition (OCR) error.
Language behavior and planning
Sentences often don't just mean what they say; For example, "Can you pass the salt?" a good answer would be to pass the salt. In most contexts, "yes" would be a poor answer, although a "no" or "it's too far out of reach" would be acceptable. Moreover, if a course was not offered last year, it would be less likely to ask, "How many students failed this course last year?" It is better to answer "This course was not offered last year" than "No one failed."

Correlation technique

broadcast
EDITOR

Computer science

The original purpose of natural language processing is to realize the natural language dialogue between human and computer, and the computer as a subject of dialogue is the prerequisite of the concept of natural language processing. For a long time, people have been full of expectations for the application of robots in life, becoming an important productive force to promote social development, especially so that robots have "human intelligence". Natural language processing, as an important part of the field of artificial intelligence, has a symbolic role in promoting the true intelligence of robots. In recent years, computer performance has been greatly improved in terms of data storage capacity and processing speed, which has made it possible to process massive data, probability statistics, discover language laws and obtain internal connections. [11]

Internet technology

The emergence of the Internet makes the dissemination of information more convenient. Various new media based on the Internet technology have become the main means of information dissemination. Various network chat software have increased the means of communication. It provides a lot of resources for using natural language processing based on statistics. Based on the Internet technology, the open source platform is also an important way for researchers to obtain research resources. [11]

Machine learning method

Machine learning is a multidisciplinary discipline that uses data and experience to improve computer algorithms and optimize computer performance. It can be traced back to the least square method and Markov chain in the 17th century, but its real development should start from the 1950s. After the implementation of "learning without knowledge", the system description based on graph structure and logical structure, and the development of multiple concept learning combined with various applications, it has entered the fourth stage of updating and truly making computers intelligent since the mid-1980s. [11]
The use of semi-supervised or unsupervised machine learning methods to process massive natural languages also corresponds to the development of machine learning, which can be roughly divided into two stages: traditional machine learning based on linear models of discrete representation, and deep learning based on nonlinear models of continuous representation. [11]
Deep learning is a computer automatic learning algorithm, which consists of input layer, hidden layer and output layer. The input layer is a large amount of data provided by researchers and is the processing object of the algorithm. The number of hidden layers is determined by the experimenter, and it is a process in which the algorithm marks the data, finds the rule and establishes the relationship between feature points. The output layer is the result that researchers can get. Generally speaking, the more data obtained by the input layer and the more layers of the hidden layer, the better the data differentiation results will be. However, the problem is that the calculation amount increases and the calculation difficulty increases. As the latest driving force for natural language processing, machine learning offers unprecedented advantages: [11]
(a) Overcome the shortcomings of sparse artificial labeling of language features, deep learning can use distributed vectors to classify words, and word class labels, word meaning labels, dependency relationships, etc., can be effectively labeled; [11]
(2) It overcomes the problem of incomplete manual language marks. Manual language marks are likely to be missed due to the heavy workload, and an efficient computer can greatly reduce such mistakes; [11]
(3) It overcomes the problem of large computation amount and long computation time of traditional machine learning algorithm, and deep learning uses matrix to greatly compress the computation amount. [11]

Tools and platforms

broadcast
EDITOR
NLTK [12] : Comprehensive python base NLP library.
StanfordNLP [13] : NLP algorithm library commonly used in academia.
Chinese NLP tool: THULAC [14] LTP, HIT [15] , jieba participle [16] .

Research hotspot

broadcast
EDITOR
Pre-training technique
The essence of the pre-training idea is that the model parameters are no longer randomly initialized, but trained through the language model. The current approach to NLP tasks is pre-training and fine-tuning. Pre-training has greatly improved NLP tasks, and there are more and more pre-trained language models, from the original Word2vec] and Glove to universal language text classification models ULMFiT and EMLo. At present, the best pre-training language model is built based on Transformer model. This model, which was proposed by Vaswani et al., is completely based on Self-Attention and is the best feature extractor in the field of NLP at present, which can not only perform parallel operations but also capture long-distance feature dependencies. [17]
The most influential pre-trained language model at the moment is the two-way deep language model based on Transformer - BERT. BERT is composed of multi-layer bidirectional Transformer decoder, which mainly includes 2 versions of different sizes: the basic version has 12 Transformer layers, each Transformer has 12 multi-attention layers and 768 hidden layers; The enhanced version has 24 layers of Transformer, with 24 multiple attention layers in each Transformer and a hidden layer size of 1,024. It can be seen that deep and narrow models are better than shallow and wide models. At present, BERT has excellent performance in many tasks such as machine translation, text classification, text similarity, reading comprehension and so on. There are two training methods of BERT model: (1) The method of covering words is adopted. (2) Use the method of predicting the next sentence. [17]
The generalized language model is trained by the above two methods, and then the downstream tasks such as text classification and machine translation are performed by fine-tuning methods. Compared with previous pre-trained models, BERT can capture true bidirectional contextual semantics. However, BERT also has some disadvantages, both when training the model, the use of a large number of [masks] will affect the model effect, and only 15% of the marks in each batch are predicted, so the convergence rate of BERT is slow during training. In addition, due to the inconsistency between the pre-training process and the generation process, the performance of natural language generation tasks is poor, and BERT cannot complete document-level NLP tasks, and is only suitable for sentence and paragraph level tasks. [17]
XLNet is a generalized autoregressive language model based on Transformer-XL. Disadvantages of Transformer :(1) The maximum dependency distance between characters is limited by the input length. (2) When the input text length exceeds 512 characters, each segment is trained separately from scratch, so the training efficiency decreases and the model performance is affected. To address the above two shortcomings, Transformer-XL introduced two solutions: Division Recurrence Mechanism and Relative Positional Encoding. The Transformer-XL is faster to test and can capture longer context lengths. [17]
Unsupervised representation learning has achieved great success in the field of NLP. Under this philosophy, many researchers have explored different unsupervised pretraining goals, and autoregressive language modeling and self-coding language are the two most successful pretraining goals. XLNet is a generalized autoregressive method that integrates two kinds of autoregressive and self-coding methods. Instead of using the fixed forward or backward factorization order of traditional autoregressive models, XLNet uses a random arrangement of natural language to predict the words that are likely to appear in a certain location. This approach not only enables each location in a sentence to learn contextual information from all locations, but also builds bidirectional semantics to better capture contextual semantics. Because XLNet uses Transformer-XL, the model performs better, especially on tasks involving long text sequences. [17]
Both BERT and XLNet language models perform very well in English corpus, but not so well in Chinese corpus. ERNIE uses Chinese corpus to train a language model. ERNIE is a knowledge-enhanced semantic representation model, which has excellent performance in language inference, semantic similarity, named entity recognition, text classification and other NLP Chinese tasks. When processing Chinese corpus, ERNIE can learn the complete semantic representation of larger semantic units by modeling the predicted Chinese characters. The inner core of the ERNIE model is made up of Transformer. The model structure mainly consists of two modules. The T-Encoder of the lower module is mainly responsible for capturing the basic lexical and syntactic information from the input tag, and the KEncoder of the upper module is responsible for integrating the knowledge information obtained from the lower layer into the text information. In order to be able to represent the heterogeneous information of tags and entities into a unified feature space. [17]
Graph neural network technology
The research of Graph Neural Network mainly focuses on the propagation and aggregation of information of adjacent nodes, from the concept of graph neural network to the inspiration of convolutional neural network in deep learning [18] . Graph neural networks play a very important role in the application of non-Euclidean data in deep learning. In particular, the explainability of graph structures in traditional Bayesian causal networks is of great research significance in defining the problems of inferential relationships and explainable causation in deep neural networks. How to use deep learning method to analyze and reason the data of graph structure has attracted a lot of research and attention. [18]
The general reasoning process of graph neural network can be represented by graph node pre-representation, graph node sampling, subgraph extraction, subgraph feature fusion, graph neural network generation and training subprocesses, the specific steps are as follows: [18]
STEP1 Pre-representation of Graph nodes: each node in the graph is embedded by the method of Graph Embedding. [18]
STEP2 Figure node sampling: Sample the positive and negative samples of each node or existing node pairs in the figure; [18]
STEP3 Subgraph extraction: the neighboring nodes of each node in the graph are extracted to build n-order subgraphs, where n represents the neighboring nodes of the NTH layer, thus forming a general subgraph structure; [18]
STEP4 Subgraph feature fusion: Perform local or global feature extraction for each input neural network subgraph; [18]
STEP5 Generate graph neural network and train: define the number of layers of the network and input and output parameters, and conduct network training on the graph data. [18]
1. Figure convolutional neural network
The popularity of deep learning is closely related to the wide applicability of convolutional neural networks. Among graph neural networks, the one with the longest time and the most achievements is the graph convolutional neural network. From the perspective of feature space, the graph convolutional neural network can be divided into two types: frequency domain and spatial domain. [18]
Based on the graph signal processing problem, the convolutional layer of the graph neural network is defined as a filter, that is, the noise signal is removed by the filter to get the classification result of the input signal. In practical problems, it can only be used to deal with the undirected graph structure with no information on the edge. The graph of the input signal is defined as a Laplacian matrix that can be eigendecomposed. The normalized eigendecomposition can be expressed as a general structure, whose diagonal matrix 𝑨 is the eigenmatrix composed of 𝜆𝑖 arranged in order. [18]
2. Space-based graph convolutional neural network
Similar to convolutional neural networks in deep learning that perform convolution operations on image pixels, space-based graph convolutional neural networks represent the transfer and aggregation of information between neighboring nodes by calculating the convolution between a single node in the center and neighboring nodes, and serve as a new node representation in the feature domain. [18]
ACL 2020 (Association for Computational Linguistics), the top international conference in the field of natural language processing, announced the acceptance results of papers. A total of three papers from the "Joint Laboratory of Language and Knowledge Computing" of the Institute of Automation of the Chinese Academy of Sciences were included, respectively, making breakthroughs in the fields of automatic information extraction of medical dialogues, automatic coding of the International Classification of Diseases (ICD), and interpretability of ICD automatic coding. [20]

Future outlook

broadcast
EDITOR
The field of natural language processing has always been dominated by two research methods, rule-based and statistic-based, which have successively encountered bottlenecks. It is difficult for rule-based and traditional machine learning methods to make greater breakthroughs after reaching a certain stage, and it is not until the improvement of computing power and data storage that the development of natural language processing is greatly promoted. Breakthroughs in speech recognition have made deep learning technology very popular. Machine translation has also made great progress, and Google Translate is now taking machine translation to a new level with deep neural network technology, even if it is not up to human translation standards, it is enough to handle most of the needs. Information extraction has also become more intelligent, better able to understand complex sentence structures and relationships between entities, and extract the correct facts. Deep learning promotes the progress of natural language processing tasks, and natural language processing tasks also provide broad application prospects for deep learning, making people invest more in algorithm design. Advances in artificial intelligence will continue to promote the development of natural language processing, but also make natural language processing face the following challenges: [19]
1) Better algorithm. Of the three elements of AI development (data, computing power, and algorithms), the one that is most relevant to NLP researchers is algorithm design. Deep learning has shown strong advantages in many tasks, but the rationality of backward propagation has recently been questioned. Deep learning is a method to complete small tasks through big data, which focuses on induction, and the learning efficiency is relatively low. However, whether we can start from small data, analyze its underlying principles, and complete multiple tasks from the perspective of deduction is a very worthy direction of future research. [19]
2) In-depth analysis of language. Although deep learning has greatly improved the effectiveness of natural language processing, the field is about the science of language technology, rather than finding the best machine learning methods, and the core is still linguistic problems. Problems in future languages also need to pay attention to semantic understanding. From large-scale network data, through in-depth semantic analysis, combined with linguistic theories, we can find the laws of semantic generation and understanding, study the hidden patterns behind the data, expand and improve the existing knowledge model, and make semantic representation more accurate. Language understanding needs the combination of reason and experience, reason is a priori, and experience can expand knowledge, so it is necessary to make full use of world knowledge and linguistic theory to guide advanced technology to understand semantics. Distributed word vectors contain part of semantic information, and richer semantics can be expressed through different combinations of word vectors, but the semantic role of word vectors is not fully played. It is the key task of future research to mine semantic representation patterns in languages and express semantics completely and accurately in formal languages for computers to understand. [19]
3) The intersection of multiple disciplines. In the problem of understanding semantics, we need to find a suitable model. In the exploration of the model, it is necessary to fully learn from the research achievements in the field of philosophy of language, cognitive science and brain science, and discover the generation and understanding of semantics from the perspective of cognition, which may establish a better model for language understanding. In today's scientific and technological innovation, the intersection of multiple disciplines can better promote the development of natural language processing. [19]
Deep learning has brought a major technological breakthrough to natural language processing, and its widespread application has greatly changed People's Daily life. When deep learning is combined with other cognitive sciences and linguistics, it may be able to exert greater power to solve semantic understanding problems and bring about true "intelligence." [19]
Despite the great success of deep learning in various NLP tasks, there are still many research difficulties to overcome if it is to be used on a large scale. The larger the deep neural network model, the longer the training time of the model, how to reduce the model volume while keeping the model performance unchanged is a future research direction. In addition, the deep neural network model has poor interpretability and little progress in natural language generation. However, with the continuous deepening of deep learning research, in the near future, the field of NLP will achieve more research results and development. [17]