World Language Resources Database for Language Identification-Beijing Language and Culture University Language Resource High-Quality Innovation Center

World Language Resources Database for Language Identification

May 31, 2019

This resource is the result of the sub-project of ACLR's research project Language Recognition Theory and Methodological Research on Language Statistics. The manager is Prof. Ran Qibin, Honorary Researcher of ACLR and Professor of Nankai University.

The database consists of the phonetic form of 40 most stable items from Swadesh wordlist. Information in the database include language name, wlsfam encoding, wlsgen encoding, e encoding, hh encoding, longitude and latitude, population of usage, WALS encoding, and ISO693-3 encoding, etc. Phonetic form of the 40 items are encoded in ASJP code for computer processing easily. The main purpose of the database is to calculate the distance between world languages through ASJP (Automated Similarity Judgment Program) and then to study world language classification, world language history genetics, phonetic correspondence, language origin, language migration rate, etc. The database can be calculated via R and Python, and can be associated with other world language databases such as GlottoLog, WALS, etc.

The project has made an ASJP mode database containing 9788 doculects, the largest number of such kind in the world so far, which has greatly enriched the data volume of world languages, Chinese languages, and especially the doculects of Chinese dialects in Chinese academic circle.

Based on the comprehensive calculation of LDND distance, the project produced 4 numerical ranges, which can be used to distinguish traditionally " languages of different families ", " languages of the same family but different groups", " languages of the same family and the same group" and "variants of the same dialect ", providing objective and effective indicators for determining the identity and relationship of languages and their variants.