Re-Distributing Facial Features for Engagement Prediction with ModernTCN
作者机构:College of Information and Artificial Intelligence Nanchang Institute of Science and Technology Nanchang 330108 China School of Electrical and Information Engineering Wuhan Institute of Technology Wuhan 430205 China School of Electronic Information Engineering Wuhan Donghu University Wuhan 430212 China
出 版 物:《Computers, Materials and Continua》 (计算机、材料和连续体(英文))
年 卷 期:2024年第81卷第1期
页 面:369-391页
核心收录:
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:National Natural Science Foundation of China, NSFC, (62367006) National Natural Science Foundation of China, NSFC Graduate Innovative Fund of Wuhan Institute of Technology, (CX2023551)
主 题:Engagement prediction re-distributing facial features spatiotemporal network temporal convolutional network
摘 要:Automatically detecting learners’ engagement levels helps to develop more effective online teaching and assessment programs, allowing teachers to provide timely feedback and make personalized adjustments based on students’ needs to enhance teaching effectiveness. Traditional approaches mainly rely on single-frame multimodal facial spatial information, neglecting temporal emotional and behavioural features, with accuracy affected by significant pose variations. Additionally, convolutional padding can erode feature maps, affecting feature extraction’s representational capacity. To address these issues, we propose a hybrid neural network architecture, the redistributing facial features and temporal convolutional network (RefEIP). This network consists of three key components: first, utilizing the spatial attention mechanism large kernel attention (LKA) to automatically capture local patches and mitigate the effects of pose variations; second, employing the feature organization and weight distribution (FOWD) module to redistribute feature weights and eliminate the impact of white features and enhancing representation in facial feature maps. Finally, we analyse the temporal changes in video frames through the modern temporal convolutional network (ModernTCN) module to detect engagement levels. We constructed a near-infrared engagement video dataset (NEVD) to better validate the efficiency of the RefEIP network. Through extensive experiments and in-depth studies, we evaluated these methods on the NEVD and the Database for Affect in Situations of Elicitation (DAiSEE), achieving an accuracy of 90.8% on NEVD and 61.2% on DAiSEE in the four-class classification task, indicating significant advantages in addressing engagement video analysis problems. © 2024 The Authors. Published by Tech Science Press.