咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Red Alarm for Pre-trained Mode... 收藏

Red Alarm for Pre-trained Models:Universal Vulnerability to Neuron-level Backdoor Attacks

作     者:Zhengyan Zhang Guangxuan Xiao Yongwei Li Tian Lv Fanchao Qi Zhiyuan Liu Yasheng Wang Xin Jiang Maosong Sun Zhengyan Zhang;Guangxuan Xiao;Yongwei Li;Tian Lv;Fanchao Qi;Zhiyuan Liu;Yasheng Wang;Xin Jiang;Maosong Sun

作者机构:Department of Computer Science and TechnologyTsinghua UniversityBeijing 100084China Institute for Artificial IntelligenceTsinghua UniversityBeijing 100084China Beijing National Research Center for Information Science and TechnologyBeijing 100084China Huawei Noah′s Ark LaboratoryHong Kong 999077China 

出 版 物:《Machine Intelligence Research》 (机器智能研究(英文版))

年 卷 期:2023年第20卷第2期

页      面:180-193页

核心收录:

学科分类:0710[理学-生物学] 0831[工学-生物医学工程(可授工学、理学、医学学位)] 12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 08[工学] 081104[工学-模式识别与智能系统] 080203[工学-机械设计及理论] 0802[工学-机械工程] 0835[工学-软件工程] 0836[工学-生物工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:supported by the National Key Research and Development Program of China(No.2020AAA0106500) the National Natural Science Foundation of China(NSFC No.62236004) 

主  题:Pre-trained language models backdoor attacks transformers natural language processing(NLP) computer vision(CV) 

摘      要:The pre-training-then-fine-tuning paradigm has been widely used in deep *** to the huge computation cost for pre-training,practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets,while the downloaded models may suffer backdoor *** from previous attacks aiming at a target task,we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task *** can restrict the output representations(the values of output neurons)of trigger-embedded samples to arbitrary predefined values through additional training,namely neuron-level backdoor attack(NeuBA).Since fine-tuning has little effect on model parameters,the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same *** provoke multiple labels in a specific task,attackers can introduce several triggers with predefined contrastive *** the experiments of both natural language processing(NLP)and computer vision(CV),we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger *** findings sound a red alarm for the wide use of pre-trained ***,we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分