咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >End-to-End Paired Ambisonic-Bi... 收藏

End-to-End Paired Ambisonic-Binaural Audio Rendering

作     者:Yin Zhu Qiuqiang Kong Junjie Shi Shilei Liu Xuzhou Ye Ju-Chiang Wang Hongming Shan Junping Zhang Yin Zhu;Qiuqiang Kong;Junjie Shi;Shilei Liu;Xuzhou Ye;Ju-Chiang Wang;Hongming Shan;Junping Zhang

作者机构:Shanghai Key Laboratory of Intelligent Information ProcessingSchool of Computer ScienceFudan UniversityShanghai 200433China Beijing ByteDance Technology Co.Ltd.Shanghai 201102China. Chinese University of Hong KongHong KongChina IEEE Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain ScienceFudan UniversityShanghai 200433 Shanghai Center for Brain Science and Brain-Inspired TechnologyShanghai 200031China 

出 版 物:《IEEE/CAA Journal of Automatica Sinica》 (自动化学报(英文版))

年 卷 期:2024年第11卷第2期

页      面:502-513页

核心收录:

学科分类:0711[理学-系统科学] 07[理学] 0811[工学-控制科学与工程] 

基  金:supported in part by the National Natural Science Foundation of China (62176059  62101136) 

主  题:Ambisonic attention binaural rendering neural network 

摘      要:Binaural rendering is of great interest to virtual reality and immersive media. Although humans can naturally use their two ears to perceive the spatial information contained in sounds, it is a challenging task for machines to achieve binaural rendering since the description of a sound field often requires multiple channels and even the metadata of the sound sources. In addition, the perceived sound varies from person to person even in the same sound field. Previous methods generally rely on individual-dependent head-related transferred function(HRTF)datasets and optimization algorithms that act on HRTFs. In practical applications, there are two major drawbacks to existing methods. The first is a high personalization cost, as traditional methods achieve personalized needs by measuring HRTFs. The second is insufficient accuracy because the optimization goal of traditional methods is to retain another part of information that is more important in perception at the cost of discarding a part of the information. Therefore, it is desirable to develop novel techniques to achieve personalization and accuracy at a low cost. To this end, we focus on the binaural rendering of ambisonic and propose 1) channel-shared encoder and channel-compared attention integrated into neural networks and 2) a loss function quantifying interaural level differences to deal with spatial information. To verify the proposed method, we collect and release the first paired ambisonic-binaural dataset and introduce three metrics to evaluate the content information and spatial information accuracy of the end-to-end methods. Extensive experimental results on the collected dataset demonstrate the superior performance of the proposed method and the shortcomings of previous methods.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分