The main goal of this project is to build a multi-modal Chinese interlanguage speech corpus for intelligent pronunciation teaching. Intelligent pronunciation teaching technology can automatically evaluate the pronunciation of students, detect pronunciation problems, and give feedback in multi-modal forms such as audio and video. This technology directs correction, and guides students to correct their pronunciation. The specific tasks of this project can be broken down into two parts, namely the construction of three speech corpora and the research on two annotation methods. The three speech corpora are Large-scale Chinese Interlanguage Pronunciation Subset, Subset for Auditory Perception of Main Chinese Phonetic Category by Foreign Students with Different Mother Tongues, Multi-modal Chinese Interlanguage Pronunciation Subset. Two annotation methods include automatic or semi-automatic annotation method for large-scale interlanguage speech dataset, and the annotation methods for supra-segment such as tone and rhythm.