Deep-BERT:Transfer Learning for Classifying Multilingual Offensive Texts on Social Media
作者机构:Department of Computer Science and EngineeringBangladesh University of Business and TechnologyDhakaBangladesh School of Computer Science and EngineeringUniversity of AizuAizuwakamatsuJapan Department of Computer ScienceAmerican International University-BangladeshDhakaBangladesh Department of Computer Science and EngineeringUniversity of Asia PacificDhakaBangladesh
出 版 物:《Computer Systems Science & Engineering》 (计算机系统科学与工程(英文))
年 卷 期:2023年第44卷第2期
页 面:1775-1791页
核心收录:
学科分类:0502[文学-外国语言文学] 050201[文学-英语语言文学] 05[文学]
主 题:Offensive text classification deep convolutional neural network(DCNN) bidirectional encoder representations from transformers(BERT) natural language processing(NLP)
摘 要:Offensive messages on social media,have recently been frequently used to harass and criticize *** recent studies,many promising algorithms have been developed to identify offensive *** algorithms analyze text in a unidirectional manner,where a bidirectional method can maximize performance results and capture semantic and contextual information in *** addition,there are many separate models for identifying offensive texts based on monolin-gual and multilingual,but there are a few models that can detect both monolingual and multilingual-based offensive *** this study,a detection system has been developed for both monolingual and multilingual offensive texts by combining deep convolutional neural network and bidirectional encoder representations from transformers(Deep-BERT)to identify offensive posts on social media that are used to harass *** paper explores a variety of ways to deal with multilin-gualism,including collaborative multilingual and translation-based ***,the Deep-BERT is tested on the Bengali and English datasets,including the different bidirectional encoder representations from transformers(BERT)pre-trained word-embedding techniques,and found that the proposed Deep-BERT’s efficacy outperformed all existing offensive text classification algorithms reaching an accuracy of 91.83%.The proposed model is a state-of-the-art model that can classify both monolingual-based and multilingual-based offensive texts.