咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >A non-intrusive speech quality... 收藏

A non-intrusive speech quality evaluation algorithm combining auxiliary target learning and convolutional recurrent network

作     者:TANG Guichen LIANG Ruiyu KONG Fanliu XIE Yue JU Mengjie TANG Guichen;LIANG Ruiyu;KONG Fanliu;XIE Yue;JU Mengjie

作者机构:School of Information and Communication EngineeringNanjing Institute of TechnologyNanjing 211167 School of Information Science and EngineeringSoutheast UniversityNanjing 210096 

出 版 物:《Chinese Journal of Acoustics》 (声学学报(英文版))

年 卷 期:2023年第42卷第2期

页      面:235-250页

核心收录:

学科分类:0711[理学-系统科学] 07[理学] 08[工学] 081104[工学-模式识别与智能系统] 0811[工学-控制科学与工程] 

基  金:supported by the National Key Research and Development Program of China(2020YFC2004002,2020YFC2004003) the National Natural Science Foundation of China(62001215) the Scientific Research Fund Project of Nanjing Institute of Technology(CKJC202001)。 

主  题:network speech algorithm 

摘      要:The objective evaluation of speech quality can replace expensive manual scoring,but current objective indicators usually need pure reference speech,which is difficult to obtain in many practical acoustic systems.A non-intrusive speech quality evaluation algorithm combining auxiliary target learning and a convolutional recurrent network(CRN)is proposed.Bark frequency cepstral coefficients(BFCCs),which are based on human-like auditory filters,are used as the input of the CRN network to effectively reduce the network complexity.Firstly,frame-level features are extracted by a convolutional neural network(CNN)from BFCCs.Then,long-term time dependence and sequence features are modeled by the bidirectional long shortterm memory(BiLSTM)networks in frame-level features.Finally,a self-attention mechanism is introduced into the CRN,thereby adaptively extracting useful information from frame-level features,which is then integrated into the characteristics of the sentence level and mapped into the final objective score.In addition,a multi-task training strategy is adopted,and voice activity detection(VAD)is introduced as an auxiliary learning target to improve the performance of the algorithm.Experiments in public databases show that compared with other non-intrusive algorithms,the proposed algorithm has a better correlation with the mean opinion score(MOS).Moreover,it has a small parameter size and good generalization ability for the distorted speech database with MOS released by ITU-T P.808,which is close to the accuracy of the perceptual evaluation of speech quality(PESQ).

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分