中文 | English

中文 | English

Large-Scale Chinese Language Corpus for Syntactic Analysis
May 30, 2019

This resource is the result of ACLR's research project Syntactic and Semantic Analysis and Application. The project leader is Xun Endong, chief expert of ACLR and Professor of BLCU.

The Corpus mainly includes matching and parataxis diagram resources for Chinese language syntactic and semantic analysis. Matching resources are used in the symbols calculation or the rule analysis in syntactic and semantic analysis, providing structural and semantic information. The annotated parataxis graph resources act as dataset for the JParser platform for evaluating and validating the analysis effect.

The research group carried out a large-scale, in-depth, multi-granular language knowledge project for Chinese syntactic and semantic analysis.

1. Annotation of Parataxis Diagram. The team initiated the Chinese Parataxis Diagram annotation task in response to the proposed Parataxis Diagram of Chinese syntax and semantics, to verify its rationality, and to provide verification data for subsequent work. At present, they have customized the Parataxis Diagram annotation specifications and auxiliary annotation software, and about 10,000 sentences of the diagram have been annotated.

2. Annotation of the internal structure of Chinese verbs. For multi-character Chinese verbs, annotate the internal structure, such as structure type, core words, whether it can be used separately, etc. This work has been completed with approximately 20,000 verbs been annotated.

3. Annotation of Chinese "block dependence" structure. Syntactic component blocks, inter-sentence cohesive blocks and auxiliary blocks are distinguished for Chinese sentences, and the basic skeleton of the sentence is presented through the block sequence. Approximately 600,000 clauses have been annotated.

4. Annotation of Chinese collocation. A large-scale investigation was conducted on the block collocation in Chinese big data, and a high-quality collocation library was formed, with more than 9 million examples.

For more details, please click:

http://yuyanziyuan.blcu.edu.cn/info/1066/2532.htm