Frame Length Dependency for Fundamental Frequency Extraction in Noisy Speech
Frame Length Dependency for Fundamental Frequency Extraction in Noisy Speech作者机构:Department of Information and Communication Technology Comilla University Cumilla Bangladesh Department of Computer Science and Engineering Bangladesh Army International University of Science and Technology (BAIUST) Cumilla Bangladesh Department of Information Technology University of Information Technology and Sciences (UITS) Dhaka Bangladesh
出 版 物:《Journal of Signal and Information Processing》 (信号与信息处理(英文))
年 卷 期:2024年第15卷第1期
页 面:1-17页
学科分类:0809[工学-电子科学与技术(可授工学、理学学位)] 08[工学]
主 题:Pitch Estimation Fundamental Frequency BaNa ACF Frame Length
摘 要:The fundamental frequency plays a significant part in understanding and perceiving the pitch of a sound. The pitch is a fundamental attribute employed in numerous speech-related works. For fundamental frequency extraction, several algorithms have been developed which one to use relies on the signal’s characteristics and the surrounding noise. Thus, the algorithm’s noise resistance becomes more critical than ever for precise fundamental frequency estimation. Nonetheless, numerous state-of-the-art algorithms face struggles in achieving satisfying outcomes when confronted with speech recordings that are noisy with low signal-to-noise ratio (SNR) values. Also, most of the recent techniques utilize different frame lengths for pitch extraction. From this point of view, This research considers different frame lengths on male and female speech signals for fundamental frequency extraction. Also, analyze the frame length dependency on the speech signal analytically to understand which frame length is more suitable and effective for male and female speech signals specifically. For the validation of our idea, we have utilized the conventional autocorrelation function (ACF), and state-of-the-art method BaNa. This study puts out a potent idea that will work better for speech processing applications in noisy speech. From experimental results, the proposed idea represents which frame length is more appropriate for male and female speech signals in noisy environments.