咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >A Robust Conformer-Based Speec... 收藏

A Robust Conformer-Based Speech Recognition Model for Mandarin Air Traffic Control

作     者:Peiyuan Jiang Weijun Pan Jian Zhang Teng Wang Junxiang Huang 

作者机构:College of Air Traffic ManagementCivil Aviation Flight University of ChinaDeyang618307China East China Air Traffic Management BureauXiamen Air Traffic Management StationXiamen361015China 

出 版 物:《Computers, Materials & Continua》 (计算机、材料和连续体(英文))

年 卷 期:2023年第77卷第10期

页      面:911-940页

核心收录:

学科分类:081203[工学-计算机应用技术] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:This study was co-supported by the National Key R&D Program of China(No.2021YFF0603904) National Natural Science Foundation of China(U1733203) Safety Capacity Building Project of Civil Aviation Administration of China(TM2019-16-1/3) 

主  题:Air traffic control automatic speech recognition conformer robustness evaluation T5 error correction model 

摘      要:This study aims to address the deviation in downstream tasks caused by inaccurate recognition results when applying Automatic Speech Recognition(ASR)technology in the Air Traffic Control(ATC)*** paper presents a novel cascaded model architecture,namely Conformer-CTC/Attention-T5(CCAT),to build a highly accurate and robust ATC speech recognition *** tackle the challenges posed by noise and fast speech rate in ATC,the Conformer model is employed to extract robust and discriminative speech representations from raw *** the decoding side,the Attention mechanism is integrated to facilitate precise alignment between input features and output *** Text-To-Text Transfer Transformer(T5)language model is also introduced to handle particular pronunciations and code-mixing issues,providing more accurate and concise textual output for downstream *** enhance the model’s robustness,transfer learning and data augmentation techniques are utilized in the training *** model’s performance is optimized by performing hyperparameter tunings,such as adjusting the number of attention heads,encoder layers,and the weights of the loss *** experimental results demonstrate the significant contributions of data augmentation,hyperparameter tuning,and error correction models to the overall model *** the Our ATC Corpus dataset,the proposed model achieves a Character Error Rate(CER)of 3.44%,representing a 3.64%improvement compared to the baseline ***,the effectiveness of the proposed model is validated on two publicly available *** the AISHELL-1 dataset,the CCAT model achieves a CER of 3.42%,showcasing a 1.23%improvement over the baseline ***,on the LibriSpeech dataset,the CCAT model achieves a Word Error Rate(WER)of 5.27%,demonstrating a performance improvement of 7.67%compared to the baseline ***,this paper proposes an evaluation criterion for assessing the robustness of ATC speech recognit

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分