咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Handling OOV Words in Mandarin... 收藏

Handling OOV Words in Mandarin Spoken Term Detection with an Hierarchical n-Gram Language Model

Handling OOV Words in Mandarin Spoken Term Detection with an Hierarchical n-Gram Language Model

作     者:WANG Xuyang ZHANG Pengyuan NA Xingyu PAN Jielin YAN Yonghong 

作者机构:The Key Laboratory of Speech Acoustics and Content Understanding Chinese Academy of Sciences Xinjiang Laboratory of Minority Speech and Language Information Processing Chinese Academy of Sciences 

出 版 物:《Chinese Journal of Electronics》 (电子学报(英文))

年 卷 期:2017年第26卷第6期

页      面:1239-1244页

核心收录:

学科分类:0711[理学-系统科学] 07[理学] 

基  金:supported by the National Natural Science Foundation of China(No.11461141004,No.61271426,No.11504406,No.11590770,No.11590771,No.11590772,No.11590773,No.11590774) the Strategic Priority Research Program of the Chinese Academy of Sciences(No.XDA06030100,No.XDA06030500,No.XDA06040603) National 863 Program(No.2015AA016306) National 973 Program(No.2013CB329302) the Key Science and Technology Project of the Xinjiang Uygur Autonomous Region(No.201230118-3) 

主  题:Spoken term detection(STD) Language model(LM) Out-of-vocabulary(OOV) words 

摘      要:In this paper, an hierarchical n-gram Language model(LM) combining words and characters is explored to improve the detection of Out-of-vocabulary(OOV) words in Mandarin Spoken term detection(STD).The hierarchical LM is based on a word-level LM, with a character-level LM estimating probabilities of OOV words in a class-based way. The region containing OOV words in the sentence to be decoded is detected with the help of the word-level LM and the probabilities of OOV words are derived from the character-level LM. The implementation of the proposed approach is based on a dynamic decoder. The proposed approach is evaluated in terms of Actual term weighted value(ATWV) on two Mandarin data sets. Experiment results show that more than 10% relative improvement for OOV word detection is achieved on both sets. In addition, the detection of In-vocabulary(IV) words is barely influenced as well.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分