A Robust Conformer-Based Speech Recognition Model for Mandarin Air Traffic Control
作者机构:College of Air Traffic ManagementCivil Aviation Flight University of ChinaDeyang618307China East China Air Traffic Management BureauXiamen Air Traffic Management StationXiamen361015China
出 版 物:《Computers, Materials & Continua》 (计算机、材料和连续体(英文))
年 卷 期:2023年第77卷第10期
页 面:911-940页
核心收录:
学科分类:081203[工学-计算机应用技术] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:This study was co-supported by the National Key R&D Program of China(No.2021YFF0603904) National Natural Science Foundation of China(U1733203) Safety Capacity Building Project of Civil Aviation Administration of China(TM2019-16-1/3)
主 题:Air traffic control automatic speech recognition conformer robustness evaluation T5 error correction model
摘 要:This study aims to address the deviation in downstream tasks caused by inaccurate recognition results when applying Automatic Speech Recognition(ASR)technology in the Air Traffic Control(ATC)*** paper presents a novel cascaded model architecture,namely Conformer-CTC/Attention-T5(CCAT),to build a highly accurate and robust ATC speech recognition *** tackle the challenges posed by noise and fast speech rate in ATC,the Conformer model is employed to extract robust and discriminative speech representations from raw *** the decoding side,the Attention mechanism is integrated to facilitate precise alignment between input features and output *** Text-To-Text Transfer Transformer(T5)language model is also introduced to handle particular pronunciations and code-mixing issues,providing more accurate and concise textual output for downstream *** enhance the model’s robustness,transfer learning and data augmentation techniques are utilized in the training *** model’s performance is optimized by performing hyperparameter tunings,such as adjusting the number of attention heads,encoder layers,and the weights of the loss *** experimental results demonstrate the significant contributions of data augmentation,hyperparameter tuning,and error correction models to the overall model *** the Our ATC Corpus dataset,the proposed model achieves a Character Error Rate(CER)of 3.44%,representing a 3.64%improvement compared to the baseline ***,the effectiveness of the proposed model is validated on two publicly available *** the AISHELL-1 dataset,the CCAT model achieves a CER of 3.42%,showcasing a 1.23%improvement over the baseline ***,on the LibriSpeech dataset,the CCAT model achieves a Word Error Rate(WER)of 5.27%,demonstrating a performance improvement of 7.67%compared to the baseline ***,this paper proposes an evaluation criterion for assessing the robustness of ATC speech recognit