RASC863-A Chinese Speech Corpus with Four Regional Accents
作者单位:The Institute of LinguisticsChinese Academy of Social Sciences Institute of Computing TechnologyChinese Academy of Sciences
会议日期:2004年
学科分类:0501[文学-中国语言文学] 050103[文学-汉语言文字学] 05[文学]
摘 要:This paper introduces RASC863(Regional Accented Speech Corpus funded by National 863 Project), a Chinese speech corpus with 4 regional accents of Shanghai(Wu), Guangzhou(Yue), Chongqing(Southwestern Mandarin) and Xiamen(Min) respectively. The corpus consists of spontaneous speech, read speech and selected dialectical words. For the spontaneous speech, each speaker was asked to select a topic himself or from our prepared topic sheet with a variety of 160 topics and then to give a 4-5 minute spontaneous speech on the topic. Besides, each speaker was asked to answer 15 questions spontaneously. The read speech consists of 2200 phonetically balanced sentences selected automatically, and 460 sentences frequently used in daily life. For each dialectal region, we prepared those words that are frequently used in daily life and are different from Standard Chinese, and each speaker was asked to read 15 dialectal words. 800 speakers(200 from each region;balanced in terms of the age, sex, and educational background) were recruited in the project.