咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >A policy gradient algorithm in... 收藏

A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control

A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control

作     者:DONG Xiang ZHANG Jing CHENG Long XU WenJun SU Hang MEI Tao 

作者机构:School of Electrical Engineering and AutomationAnhui UniversityHefei 230601China State Key Laboratory for Control and Management of Complex SystemsInstitute of AutomationChinese Academy of SciencesBeijing 100190China Robotics Research CenterPeng Cheng LaboratoryShenzhen 518055China 

出 版 物:《Science China(Technological Sciences)》 (中国科学(技术科学英文版))

年 卷 期:2022年第65卷第10期

页      面:2409-2419页

核心收录:

学科分类:08[工学] 080202[工学-机械电子工程] 0805[工学-材料科学与工程(可授工学、理学学位)] 0802[工学-机械工程] 080201[工学-机械制造及其自动化] 

基  金:partially supported by the National Key Research and Development Project Monitoring and Prevention of Major Natural Disasters Special Program (Grant No. 2020YFC1512202) the Anhui University Cooperative Innovation Project (Grant No. GXXT-2019-003) 

主  题:soft arm control Cosserat rod deep reinforcement learning policy gradient algorithm high sample complexity 

摘      要:The soft continuum arm has extensive application in industrial production and human life due to its superior safety and flexibility. Reinforcement learning is a powerful technique for solving soft arm continuous control problems, which can learn an effective control policy with an unknown system model. However, it is often affected by the high sample complexity and requires huge amounts of data to train, which limits its effectiveness in soft arm control. An improved policy gradient method, policy gradient integrating long and short-term rewards denoted as PGLS, is proposed in this paper to overcome this issue. The shortterm rewards provide more dynamic-aware exploration directions for policy learning and improve the exploration efficiency of the algorithm. PGLS can be integrated into current policy gradient algorithms, such as deep deterministic policy gradient(DDPG). The overall control framework is realized and demonstrated in a dynamics simulation environment. Simulation results show that this approach can effectively control the soft arm to reach and track the targets. Compared with DDPG and other model-free reinforcement learning algorithms, the proposed PGLS algorithm has a great improvement in convergence speed and performance. In addition, a fluid-driven soft manipulator is designed and fabricated in this paper, which can verify the proposed PGLS algorithm in real experiments in the future.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分