Authors:
(1) Hanqing ZHAO, College of Traditional Chinese Medicine, Hebei University, Funded by National Natural Science Foundation of China (No.82004503) and Science and Technology Project of Hebei Education Department(BJK2024108) and a Corresponding Author ([email protected]);
(2) Yuehan LI, College of Traditional Chinese Medicine, Hebei University.
Table of Links
2. Materials and Methods
2.1 Experimental Data and 2.2 Conditional random fields mode
2.3 TF-IDF algorithm and 2.4 Dependency Parser Based on Neural Network
3 Experimental results
3.1 Results of word segmentation and entity recognition
3.2 Visualization results of related entity vocabulary map
3.3 Results of dependency parsing
2.3 TF-IDF algorithm
TF-IDF (Term frequency-inverse Document Frequency) is a common weighting technique used in information retrieval and data mining, which is often used to mine keywords in articles. TF-IDF is a statistical analysis method used to evaluate the importance of a term to a document set or a corpus. Term Frequency (TF) is the number or frequency of occurrences of a term in a document. If a term appears more than once in a document, it is likely to be an important term. The formula is as follows:
Term Frequency (TF) = The number of times a term appears in a document/the total number of words in the document
Inverse Document Frequency (IDF) =log(total number of documents in the corpus /(number of documents containing the term +1))
TF−IDF= Term frequency (TF) × Inverse Document frequency (IDF)
The importance of a term is directly proportional to the number of times it appears in the document and inversely proportional to the number of times it appears in the corpus. This calculation method can effectively avoid the influence of common words on keywords, and improve the correlation between keywords and articles.
2.4Dependency Parser Based on Neural Network
Dependency parsing can help us understand the meaning of text. Grammar parsing is an important part of language understanding. The goal is to analyze the grammatical structure of a sentence and represent it into an understandable structure, usually a tree structure. The dependency syntax theory believes that there is a master-slave relationship between words. If a word modifies another word in a sentence, the modifier is called a dependent word, and the modified word is called a dominant word.
Dependency parsing based on neural network converts the sequence of words in a sentence into a graph structure by analyzing the grammatical relationship inside the sentence. Common grammatical relations include verb-object relation, left-adjunction relation, right-adjunction relation, coordinate relation, determination-central relation, subject-verb relation, etc. Dependency grammar is a commonly used grammar system. Dependency arc connects two sentences in a sentence that have a certain grammatical relationship, forming a syntactic dependency tree. The dependency tree is constructed by the stack, starting from the root node, and then all the words stored in the cache arepushed into the stack one by one by using three states: Shift, Left-Reduce and Right-Reduce. Neural network consists of three layers, Input layer (Softmax layer), Hidden layer (Hidden layer) and output layer (Input layer). The model is fromthe 2014 paper "A Fast and Accurate Dependency Parser using Neural Networks" by Danqi Chen and Christopher D. anning. In this study, HanLP is used to implement a dependency parser based on neural networks.
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.