Exploring Sequential Feature Selection in Deep Bi-LSTM Models for Speech Emotion Recognition
作者机构:Computer Science DepartmentFuture Academy-Higher Future Institute for Specialized Technological StudiesCairo12622Egypt Department of Computer Science and Information College of Science at ZulfiMajmaah UniversityP.O.Box 66Al-Majmaah11952Saudi Arabia Preparatory Institute for Engineering Studies of GafsaZarrougGafsa2112Tunisia Computers and Systems DepartmentElectronics Research InstituteCairo12622Egypt
出 版 物:《Computers, Materials & Continua》 (计算机、材料和连续体(英文))
年 卷 期:2024年第78卷第2期
页 面:2689-2719页
核心收录:
学科分类:081203[工学-计算机应用技术] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:Majmaah University, MU, (R-2023-757) Majmaah University, MU
主 题:Artificial intelligence application multi features sequential selection speech emotion recognition deep Bi-LSTM
摘 要:Machine Learning(ML)algorithms play a pivotal role in Speech Emotion Recognition(SER),although they encounter a formidable obstacle in accurately discerning a speaker’s emotional *** examination of the emotional states of speakers holds significant importance in a range of real-time applications,including but not limited to virtual reality,human-robot interaction,emergency centers,and human behavior *** identifying emotions in the SER process relies on extracting relevant information from audio *** studies on SER have predominantly utilized short-time characteristics such as Mel Frequency Cepstral Coefficients(MFCCs)due to their ability to capture the periodic nature of audio signals *** these traits may improve their ability to perceive and interpret emotional depictions appropriately,MFCCS has some *** this study aims to tackle the aforementioned issue by systematically picking multiple audio cues,enhancing the classifier model’s efficacy in accurately discerning human *** utilized dataset is taken from the EMO-DB database,preprocessing input speech is done using a 2D Convolution Neural Network(CNN)involves applying convolutional operations to spectrograms as they afford a visual representation of the way the audio signal frequency content changes over *** next step is the spectrogram data normalization which is crucial for Neural Network(NN)training as it aids in faster *** the five auditory features MFCCs,Chroma,Mel-Spectrogram,Contrast,and Tonnetz are extracted from the spectrogram *** attitude of feature selection is to retain only dominant features by excluding the irrelevant *** this paper,the Sequential Forward Selection(SFS)and Sequential Backward Selection(SBS)techniques were employed for multiple audio cues features ***,the feature sets composed from the hybrid feature extraction methods are fed into the deep Bidirectional Long Short