Learning Scalable Task Assignment with Imperative-Priori Conflict Resolution in Multi-UAV Adversarial Swarm Defense Problem
作者机构:School of Electronics and Information EngineeringTongji UniversityShanghai 201804China Shanghai Institute of Intelligent Science and TechnologyTongji UniversityShanghai 201804China School of AutomationBeijing Institute of TechnologyBeijing 100081China National Key Laboratory of Autonomous Intelligent Unmanned SystemsBeijing 100081China Shanghai Institute of Intelligent Science and TechnologyTongji UniversityShanghai 201804China.
出 版 物:《Journal of Systems Science & Complexity》 (系统科学与复杂性学报(英文版))
年 卷 期:2024年第37卷第1期
页 面:369-388页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 082503[工学-航空宇航制造工程] 0835[工学-软件工程] 0825[工学-航空宇航科学与技术] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported in part by the National Natural Science Foundation of China Basic Science Research Center Program under Grant No.62088101 the National Natural Science Foundation of China under Grant Nos.7217117 and 92367101 the Aeronautical Science Foundation of China under Grant No.2023Z066038001 Shanghai Municipal Science and Technology Major Project under Grant No.2021SHZDZX0100 Chinese Academy of Engineering,Strategic Research and Consulting Program under Grant No.2023-XZ-65
主 题:Conflict resolution reinforcement learning scalability task assignment.
摘 要:The multi-UAV adversary swarm defense(MUASD)problem is to defend a static base against an adversary UAV swarm by a defensive UAV *** the problem into task assignment and low-level interception strategies is a widely used ***-based approaches for task assignment are a promising *** studies on learning-based methods generally assume decentralized decision-making architecture,which is not beneficial for conflict *** contrast,centralized decision-making architecture is beneficial for conflict resolution while it is often detrimental to *** achieve scalability and conflict resolution simultaneously,inspired by a self-attention-based task assignment method for sensor target coverage problem,a scalable centralized assignment method based on self-attention mechanism together with a defender-attacker pairwise observation preprocessing(DAP-SelfAtt)is ***,an imperative-priori conflict resolution(IPCR)mechanism is proposed to achieve conflict-free ***,the IPCR mechanism is parallelized to enable efficient *** validate the algorithm,a variant of proximal policy optimization algorithm(PPO)is employed for training in scenarios of various *** experimental results show that the proposed algorithm not only achieves conflict-free task assignment but also maintains scalability,and significantly improve the success rate of defense.