A COMPLEMENTARY APPROACH TO COMPUTER-AIDED TRANSCRIPTION: SYNERGY OF STATISTICAL-BASED AND KNOWLEDGE DISCOVERY PARADIGMS
作者单位:Language Information Sciences Research Centre City University of Hong Kong Tat Chee Avenue Kowloon Tong Hong Kong SAR China Language Information Sciences Research Centre City University of Hong Kong Tat Chee Avenue Kowloon Tong Hong Kong SAR China
会议名称:《6~(th) International Conference on Spoken Language Processing》
会议日期:1000年
关 键 词:Speech to Text Statistical Modelling Cantonese Chinese
摘 要:正 The recent implementation of legal bilingualism necessitates the development of a Chinese Computer-Aided Transcription (CAT) system to product Chinese court proceedings conducted in Cantonese. The transcription system converts transcription shorthand codes into Chinese text, i.e., from phonetic to textual representation of the language. Cantonese and Mandarin Chinese have many homophonous characters. The main challenge lies in the resolution of the severe ambiguity of the conversion. N-gram statistical model is incorporated to estimate the most probable character string during conversion. Domain- specific corpora have been compiled to support the statistical computation. With additional enhancement features, the CAT system delivers a transcription accuracy of 96%. An intelligent error detection tool is built into the system to facilitate the manual correction of the remaining errors. Using decision tree algorithm and a range of text and linguistic attributes, the system can effectively alert the users to possible errors.