检索结果-南通市图书馆

Journal of Shanghai Jiaotong university(Science) 2024年第4期29卷 646-655页

作者： DONG Yubo CUI Tao ZHOU Yufan SONG Xun ZHU Yue DONG Peng School of Aeronautics and Astronautics Shanghai Jiao Tong UniversityShanghai200240China Beijing Institute of Electronic System Engineering Beijing100854China

Multi-agent reinforcement learning has recently been applied to solve pursuit ***,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn *** paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned *** ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long ***,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation *** results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the *** ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance.

关键词： multi-agent reinforcement learning deep reinforcement learning(DRL) long episode reward function

Collaborative Pushing and Grasping of Tightly Stacked Objects via Deep Reinforcement Learning

在线全文

维普期刊数据库

学校读者我要写书评

暂无评论

IEEE/CAA Journal of Automatica Sinica 2022年第1期9卷 135-145页

作者： Yuxiang Yang Zhihao Ni Mingyu Gao Jing Zhang Dacheng Tao School of Electronics and Information Hangzhou Dianzi UniversityHangzhouand also with Zhejiang Provincial Key Laboratory of Equipment ElectronicsHangzhou 310018China School of Computer Science Faculty of EngineeringUniversity of SydneyDarlingtonNSW 2006Australia JD Explore Academy ***Beijing 101111China

Directly grasping the tightly stacked objects may cause collisions and result in failures,degenerating the functionality of robotic *** by the observation that first pushing objects to a state of mutual separation and then grasping them individually can effectively increase the success rate,we devise a novel deep Q-learning framework to achieve collaborative pushing and ***,an efficient non-maximum suppression policy(PolicyNMS)is proposed to dynamically evaluate pushing and grasping actions by enforcing a suppression constraint on unreasonable ***,a novel data-driven pushing reward network called PR-Net is designed to effectively assess the degree of separation or aggregation between *** benchmark the proposed method,we establish a dataset containing common household items dataset(CHID)in both simulation and real *** trained using simulation data only,experiment results validate that our method generalizes well to real scenarios and achieves a 97%grasp success rate at a fast speed for object separation in the real-world environment.

关键词： Convolutional neural network deep Q-learning(DQN) reward function robotic grasping robotic pushing

Detecting Icing on the Blades of a Wind Turbine Using a Deep Neural Network

维普期刊数据库

同方期刊数据库评论

在线全文

学校读者我要写书评

暂无评论

Computer Modeling in Engineering & Sciences 2023年第2期134卷 767-782页

作者： Tingshun Li Jiaohui Xu Zesan Liu Dadi Wang Wen Tan School of Control and Computer Engineering North China Electric Power UniversityBeijing102206China R&D State Grid Information&Telecommunication Group Co.Beijing102211China

The blades of wind turbines located at high latitudes are often covered with ice in late autumn and winter,where this affects their capacity for power generation as well as their *** identifying the icing of the blades of wind turbines in remote areas is thus important,and a general model is needed to this *** paper proposes a universal model based on a Deep Neural Network(DNN)that uses data from the Supervisory Control and Data Acquisition(SCADA)*** datasets from SCADA are first preprocessed through undersampling,that is,they are labeled,normalized,and *** features of icing of the blades of a turbine identified in previous studies are then used to extract training data from the training dataset.A middle feature is proposed to show how a given feature is correlated with icing on the *** indicators for the model,including a reward function,are also designed to assess its predictive ***,the most suitable model is used to predict the testing data,and values of the reward function and the predictive accuracy of the model are *** proposed method can be used to relate continuously transferred features with a binary status of icing of the blades of the turbine by using variables of the middle *** results here show that an integrated indicator systemis superior to a single indicator of accuracy when evaluating the prediction model.

关键词： DNN predicting blade icing SCADA data wind power reward function

An enhanced eco-driving strategy based on reinforcement learning for connected electric vehicles:cooperative velocity and lane-changing control

在线全文

维普期刊数据库

学校读者我要写书评

暂无评论

Journal of Intelligent and Connected Vehicles 2022年第3期5卷 316-332页

作者： Haitao Ding Wei Li Nan Xu Jianwei Zhang State Key Laboratory of Automotive Simulation and Control Jilin UniversityChangchunChina

Purpose–This study aims to propose an enhanced eco-driving strategy based on reinforcement learning(RL)to alleviate the mileage anxiety of electric vehicles(EVs)in the connected ***/methodology/approach–In this paper,an enhanced eco-driving control strategy based on an advanced RL algorithm in hybrid action space(EEDC-HRL)is proposed for connected *** EEDC-HRL simultaneously controls longitudinal velocity and lateral lane-changing maneuvers to achieve more potential ***,this study redesigns an all-purpose and efficient-training reward function with the aim to achieve energy-saving on the premise of ensuring other driving ***–To illustrate the performance for the EEDC-HRL,the controlled EV was trained and tested in various traffic flow *** experimental results demonstrate that the proposed technique can effectively improve energy efficiency,without sacrificing travel efficiency,comfort,safety and lane-changing performance in different traffic flow ***/value–In light of the aforementioned discussion,the contributions of this paper are *** enhanced eco-driving strategy based an advanced RL algorithm in hybrid action space(EEDC-HRL)is proposed to jointly optimize longitudinal velocity and lateral lane-changing for connected EVs.A full-scale reward function consisting of multiple sub-rewards with a safety control constraint is redesigned to achieve eco-driving while ensuring other driving performance.

关键词： Ecological driving Electric vehicles Reinforcement learning in hybrid action space Velocity and lane-changing control reward function

Heterogeneous Network Selection Optimization Algorithm Based on a Markov Decision Model

在线全文

维普期刊数据库

学校读者我要写书评

暂无评论

China Communications 2020年第2期17卷 40-53页

作者： Jianli Xie Wenjuan Gao Cuiran Li School of Electronic and Information Engineering Lanzhou Jiaotong University

A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network *** the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward *** results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs.

关键词： heterogeneous wireless networks Markov decision process reward function genetic algorithm simulated annealing

A survey of inverse reinforcement learning techniques

维普期刊数据库

同方期刊数据库评论

在线全文

学校读者我要写书评

暂无评论

International Journal of Intelligent Computing and Cybernetics 2012年第3期5卷 293-311页

作者： Shao Zhifei Er Meng Joo School of Electrical and Electronics Engineering Nanyang Technological UniversitySingapore

Purpose-This purpose of this paper is to provide an overview of the theoretical background and applications of inverse reinforcement learning(IRL).Design/methodology/approach-Reinforcement learning(RL)techniques provide a powerful solution for sequential decision making problems under *** uses an agent equipped with a reward function to find a policy through interactions with a dynamic ***,one major assumption of existing RL algorithms is that reward function,the most succinct representation of the designer’s intention,needs to be provided *** practice,the reward function can be very hard to specify and exhaustive to tune for large and complex problems,and this inspires the development of IRL,an extension of RL,which directly tackles this problem by learning the reward function through expert *** this paper,the original IRL algorithms and its close variants,as well as their recent advances are reviewed and ***-This paper can serve as an introduction guide of fundamental theory and developments,as well as the applications of ***/value-This paper surveys the theories and applications of IRL,which is the latest development of RL and has not been done so far.

关键词： Inverse reinforcement learning reward function Reinforcement learning Artificial intelligence Learning methods

Optimization of the Ice Storage Air Conditioning System Operation Based on Deep Reinforcement Learning

在线全文

维普期刊数据库

学校读者我要写书评

暂无评论

Optimization of the Ice Storage Air Conditioning System Oper...

第40届中国控制会议

作者： Mingte Li Fei Xia Lin Xia College of Automation Engineering Shanghai University of Electric Power

With the intention of obtaining the room temperature and economic cost control strategy of an ice storage air conditioning system in a small office building in Shanghai,the ice storage air conditioning system is established in this paper as a Markov decision process model and deep reinforcement learning algorithms are adopted to optimize its *** order to avoid the problem of over-dimension and over-estimation of value function caused by reinforcement learning,the DDQN（Double Deep Q-Network） algorithm with dual neural network structure is adopted to optimize the operation of ice storage air-conditioning *** at overcoming the shortcoming of slow convergence of DDQN algorithm,the action space of DDQN is taken into consideration in this ***,the appropriate action set is selected according to the convergence speed of different ***,the exponential function is addressed in the reward *** on the exponential function,the reward function can adjust the penalty value according to the difference between the expected room temperature and the actual room temperature,thus speeding up the convergence of the DDQN ***,Python is adopted to model and simulate buildings and ice storage air conditioning *** simulation results show that the operating cost and the proportion of uncomfortable time are both reduced by DDQN control method proposed in this paper with better control performance.

关键词： DDQN Ice storage air-conditioning Optimization operation reward function

Manipulator Control Method Based on Deep Reinforcement Learning

在线全文

cnki会议

学校读者我要写书评

暂无评论

Manipulator Control Method Based on Deep Reinforcement Learn...

第32届中国控制与决策会议

作者： Rui Zeng Manlu Liu Junjun Zhang Xinmao Li Qijie Zhou Yuanchen Jiang Special Environment Robot Technology Key Laboratory of Sichuan Province Southwest University of Science and Technology

Robotic arm have transformed the manufacturing industry and have been used for scientific exploration in human inaccessible environments. The existing manipulator control methods based on deep reinforcement learning usually discretize the action space or consider the planar manipulator, which results in great limitations of the tasks that the manipulator can accomplish complete. In this paper, we propose a control method based on the Deep Deterministic Policy Gradient（DDPG） algorithm for the 6 degree-of-freedom manipulator that reach the object position in three-dimensional space. This paper designs two types of reward functions, and introduces the manipulability index into the algorithm. The manipulability index evaluates the flexibility of the robotic arm in the work space, which is referenced by the algorithm to optimize the joint pose of the robotic arm to reach the object position. By building a simulation platform to compare the algorithms based on two reward functions, the effectiveness of the DDPG algorithm is verified, and the 6 degree-of-freedom manipulator can reach the object position with more flexible posture based on the DDPG algorithm with manipulability index.

关键词： Deep reinforcement learning Manipulator reward function Joint pose

Reinforcement Learning Control for Robot Arm Grasping Based on Improved DDPG

在线全文

cnki会议

学校读者我要写书评

暂无评论

Reinforcement Learning Control for Robot Arm Grasping Based ...

第40届中国控制会议

作者： Guangjun Qi Yuan Li School of Automation Beijing Institute of Technology

Although the traditional robot arm grasping control has high control accuracy,its price is based on high-precision hardware and lacks *** order to achieve high control accuracy and flexibility on a relatively inexpensive robot *** paper proposes an improved DDPG（Deep Deterministic Policy Gradient） reinforcement learning algorithm to control the gripping of a robot ***,build a simulation environment for a six-DOF（six-degree-of-freedom） manipulator with a gripper in ROS（Robot Operating System）.Then,aiming at the shortcomings of traditional DDPG rewards,research and design a composite reward *** at the problem of low sampling efficiency in the free exploration of the robot arm,a batch of teaching data was added to the experience replay pool to improve learning *** simulation experiment results show that under the same number of episode of *** improved DDPG grasping control algorithm has significantly improved the grasping success *** grasping success rate after comprehensive improvement reaches 70%,which is higher than the 36% level of unimproved DDPG.

关键词： DDPG reward function Demonstration Six-DOF Arm Robot

Deep Reinforcement Learning Approach for Flocking Control of Multi-agents

在线全文

cnki会议

学校读者我要写书评

暂无评论

Deep Reinforcement Learning Approach for Flocking Control of...

第40届中国控制会议

作者： Han Zhang Jin Cheng University of Jinan

Flocking behaviors learning with multi-agents deep deterministic policy gradient algorithm is addressed in this paper. Different from the non-intelligent algorithm, agents constantly update strategies by learning the experience of random exploration, so as to obtain the optimal action. This algorithm makes the multi-agents system have both the decision ability of reinforcement learning and the data processing ability of deep learning. An artificial potential energy function is designed to evaluate the reward of aggregation posture. Actor-Critic framework is adopted to improve the parameter updating mechanism of neural network. Simulations are implemented to illustrate the flocking control performance of the learning behavor. Results show that the flocking behavor of multi-agents are satisfied as desired.

关键词： Multi-agents Deep Deterministic Policy Gradient Flocking Control reward function Actor-Critic