Wake-Up-Word Feature Extraction on FPGA
基于FPGA唤醒字特征提取作者机构:Electrical&Computer Engineering DepartmentFlorida Institute of TechnologyMelbourneUSA
出 版 物:《World Journal of Engineering and Technology》 (世界工程和技术(英文))
年 卷 期:2014年第2卷第1期
页 面:1-12页
学科分类:0810[工学-信息与通信工程] 08[工学] 081001[工学-通信与信息系统]
主 题:Speech Recognition System Feature Extraction Mel-Frequency Cepstral Coefficients Linear Predictive Coding Coefficients Enhanced Mel-Frequency Cepstral Coefficients Hidden Markov Models Field-Programmable Gate Arrays
摘 要:Wake-Up-Word Speech Recognition task (WUW-SR) is a computationally very demand, particularly the stage of feature extraction which is decoded with corresponding Hidden Markov Models (HMMs) in the back-end stage of the WUW-SR. The state of the art WUW-SR system is based on three different sets of features: Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding Coefficients (LPC), and Enhanced Mel-Frequency Cepstral Coefficients (ENH_MFCC). In (front-end of Wake-Up-Word Speech Recognition System Design on FPGA) [1], we presented an experimental FPGA design and implementation of a novel architecture of a real-time spectrogram extraction processor that generates MFCC, LPC, and ENH_MFCC spectrograms simultaneously. In this paper, the details of converting the three sets of spectrograms 1) Mel-Frequency Cepstral Coefficients (MFCC), 2) Linear Predictive Coding Coefficients (LPC), and 3) Enhanced Mel-Frequency Cepstral Coefficients (ENH_MFCC) to their equivalent features are presented. In the WUW- SR system, the recognizer’s frontend is located at the terminal which is typically connected over a data network to remote back-end recognition (e.g., server). The WUW-SR is shown in Figure 1. The three sets of speech features are extracted at the front-end. These extracted features are then compressed and transmitted to the server via a dedicated channel, where subsequently they are decoded.