检索结果-南通市图书馆

同方期刊数据库

融合Dead-ends和离线监督actor-critic的动态治疗策略生成模型

在线全文

学校读者我要写书评

暂无评论

计算机科学 2024年第7期51卷 80-88页

作者：杨莎莎于亚新王跃茹许晶铭魏阳杰李新华东北大学计算机科学与工程学院沈阳110169 医学影像智能计算教育部重点实验室(东北大学) 沈阳110169

强化学习对数学模型依赖性低,利用经验便于架构和优化模型,非常适合用于动态治疗策略学习。但现有研究仍存在以下问题:1)学习策略最优性的同时未考虑风险,导致学到的策略存在一定的风险;2)忽略了分布偏移问题,导致学到的策略与医生策略... 详细信息

强化学习对数学模型依赖性低,利用经验便于架构和优化模型,非常适合用于动态治疗策略学习。但现有研究仍存在以下问题:1)学习策略最优性的同时未考虑风险,导致学到的策略存在一定的风险;2)忽略了分布偏移问题,导致学到的策略与医生策略完全不同;3)忽略患者的历史观测数据和治疗史,从而不能很好地得到患者状态,进而导致不能学到最优策略。基于此,提出了融合Dead-ends和离线监督actor-critic的动态治疗策略生成模型DOSAC-DTR。首先,考虑学到的策略所推荐的治疗行动的风险性,在actor-critic框架中融入Dead-ends概念;其次,为缓解分布偏移问题,在actor-critic框架中融入医生监督,在最大化预期回报的同时,最小化所学策略与医生策略之间的差距;最后,为了得到包含患者关键历史信息的状态表示,使用基于LSTM的编码器解码器模型对患者的历史观测数据和治疗史进行建模。实验结果表明,DOSAC-DTR相比基线方法有更好的性能,可以得到更低的估计死亡率以及更高的Jaccard系数。

关键词：动态治疗策略 Dead-ends actor-critic 状态表征

同方期刊数据库

基于actor-critic自适应PID的钢筋套丝头跟踪检测控制系统研究

在线全文

学校读者我要写书评

暂无评论

工业控制计算机 2024年第2期37卷 75-77页

作者：秦天为冯云剑东南大学自动化学院江苏南京210096

为适应流水线节奏,不影响生产进程,从而更好地实现钢筋套丝头质量检测和尺寸测量的自动化与智能化,设计了基于同步带直线导轨的钢筋套丝头检测跟踪系统,并提出了一种基于actor-critic的自适应PID控制方法,用强化学习的方法根据环境反馈... 详细信息

为适应流水线节奏,不影响生产进程,从而更好地实现钢筋套丝头质量检测和尺寸测量的自动化与智能化,设计了基于同步带直线导轨的钢筋套丝头检测跟踪系统,并提出了一种基于actor-critic的自适应PID控制方法,用强化学习的方法根据环境反馈自动调节PID控制器的比例、积分、微分参数。对该方法和其他PID控制方法的响应性能指标进行实验和分析,实验结果表明该方法能够实现高精度、快速响应的跟踪拍摄,保证高精度的套丝头质量检测。

关键词：钢筋套丝头检测跟踪拍摄自适应PID控制 actor-critic

An actor-critic based learning method for decision-making and planning of autonomous vehicles

在线全文

学校读者我要写书评

暂无评论

Science China(Technological Sciences) 2021年第5期64卷 984-994页

作者： XU Can ZHAO WanZhong CHEN QingYun WANG ChunYan Department of Vehicle Engineering Nanjing University of Aeronautics and AstronauticsNanjing 210016China

In order to improve the agility and applicability of trajectory planning algorithm for autonomous vehicles, this paper proposes a novel actor-critic based learning method for decision-making and planning in multi-vehicle complex traffic. It is the coupling planning of vehicle’s path and speed thus to make the trajectory more flexible. First, generations from the decided action to the planned trajectory are described by the end-point of the trajectory. Then, the actor-critic based learning method is built to learn an optimal policy for the decision process. It can update the policy by the gradient of the current policy’s advantage. In this process,features of the real traffic are carefully extracted by time headway(TH) and speed distribution. Reward function is built by the safety, efficiency and driving comfort. Furthermore, to make the policy network have better convergency, the policy network is modularized in two parts: the lane-changing network and the lane-keeping network, which decide the optimal end-point of the path and speed candidates respectively. Finally, the curved overtaking scenario and the interaction process with human driver are conducted to illustrate the feasibility and superiority. The results show that the proposed method has better real-time performance and can make the planned coupling trajectory more continuous and smoother than the existing rule-based method.

关键词： trajectory planning decision-making actor-critic feature extraction autonomous driving

A Sample-Efficient actor-critic Algorithm for Recommendation Diversification

在线全文

学校读者我要写书评

暂无评论

Chinese Journal of Electronics 2020年第1期29卷 89-96页

作者： LI Shuang YAN Yanghui REN Ju ZHOU Yuezhi ZHANG Yaoxue Department of Computer Science and Technology Tsinghua University

Diversifying recommendation results gains benefits from satisfying user’s existing interests as well as exploring novel information needs. Recently proposed Monte-Carlo based reinforcement learning method suffers from sample inefficiency, large variance, and even failing to perform well in large action space. We propose a novel actor-critic reinforcement learning algorithm for recommendation diversification in order to solve the above mentioned problems. The actor acts as the ranking policy, while the introduced critic predicts the expected future rewards of each candidate action. The critic target is updated by full Bellman equation and the actor network is optimized using expected gradient in the whole action space. To further stabilize and improve the performance, we also add policy-filtered critic supervision loss. Experiments on MovieLens dataset well demonstrate the effectiveness of our approach over multiple competitive methods.

关键词： Recommender system Diversity Reinforcement learning actor-critic

A Novel Heterogeneous actor-critic Algorithm with Recent Emphasizing Replay Memory

在线全文

同方期刊数据库

学校读者我要写书评

暂无评论

International Journal of Automation and computing 2021年第4期18卷 619-631页

作者： Bao Xi Rui Wang Ying-Hao Cai Tao Lu Shuo Wang State Key Laboratory of Management and Control for Complex Systems Institute of AutomationChinese Academy of SciencesBeijing 100190China University of Chinese Academy of Sciences Beijing 100049China Center for Excellence in Brain Science and Intelligence Technology Chinese Academy of SciencesShanghai 200031China

Reinforcement learning(RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However,the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic(HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay(HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method.

关键词： Reinforcement learning(RL) actor-critic experience replay training efficiency manipulation skill learning

Learning continuous coupled multi-controller coefficients based on actor-critic algorithm for lower-limb exoskeleton

在线全文

学校读者我要写书评

暂无评论

Science China(Information Sciences) 2021年第5期64卷 230-232页

作者： Guangkui SONG Rui HUANG Hong CHENG Jing QIU Qiming CHENG Shuai FAN Center for Robotics University of Electronic Science and Technology of China School of Mechanical and Electrical Engineering University of Electronic Science and Technology of China

Dear editor,Human-powered lower exoskeletons are widely studied by academia and industry with regard to human locomotion and strength augmentation. Technological developments have boosted the use of machine learning t... 详细信息

关键词： Interactive Learning actor-critic Reinforcement Learning Physical Human-Robot Interaction Lower-limb Exoskeleton Human-powered Augmentation Continuous Domain learning continuous coeﬃcient

actor-critic框架下一种基于改进DDPG的多智能体强化学习算法

在线全文

同方期刊数据库

学校读者我要写书评

暂无评论

控制与决策 2021年第1期36卷 75-82页

作者：陈亮梁宸张景异刘韵婷沈阳理工大学自动化与电气工程学院沈阳110159

现实世界的人工智能应用通常需要多个agent协同工作,人工agent之间有效的沟通和协调是迈向通用人工智能不可或缺的一步.以自主开发的警员训练虚拟环境为测试场景,设定任务需要多个不同兵种agent小队互相协作或对抗完成.为保证沟通方式... 详细信息

现实世界的人工智能应用通常需要多个agent协同工作,人工agent之间有效的沟通和协调是迈向通用人工智能不可或缺的一步.以自主开发的警员训练虚拟环境为测试场景,设定任务需要多个不同兵种agent小队互相协作或对抗完成.为保证沟通方式有效且可扩展,提出一种混合DDPG(Mi-DDPG)算法.首先,在actor网络加入双向循环神经网络(BRNN)作为同兵种agent信息交流层;然后,在critic网络加入其他兵种agent信息来学习多agent协同策略.另外,为了缓解训练压力,采用集中训练,分散执行的框架,同时对critic网络里的Q函数进行模块化处理.实验中,在不同的场景下用Mi-DDPG算法与其他算法进行对比,Mi-DDPG在收敛速度和任务完成度方面有明显提高,具有在现实世界应用的潜在价值.

关键词：强化学习深度学习多智能体 RNN DDPG actor-critic

移动边缘计算中基于actor-critic框架的任务卸载和资源分配算法研究

在线全文

学校读者我要写书评

暂无评论

移动边缘计算中基于Actor-Critic框架的任务卸载和资源分配算法研...

作者：张杰长安大学

学位级别：硕士

移动边缘计算(Mobile Edge Computing,MEC)作为一种新兴计算范式,通过将网络中的服务器边缘化部署使得云功能更靠近用户设备,为资源受限的设备处理计算密集型和时延敏感型任务提供新思路。然而,MEC中计算和通信资源的时变性、用户设备... 详细信息

移动边缘计算(Mobile Edge Computing,MEC)作为一种新兴计算范式,通过将网络中的服务器边缘化部署使得云功能更靠近用户设备,为资源受限的设备处理计算密集型和时延敏感型任务提供新思路。然而,MEC中计算和通信资源的时变性、用户设备的随机性等因素导致难以生成最优的计算卸载策略。此外,目前现有研究大多考虑多用户单边缘服务器场景,在描述复杂的现实边缘网络时存在模型单一、实用性低等不足。为解决上述问题,论文针对多用户多边缘服务器场景设计由云端-边缘端-用户端构成的三层网络架构下的计算卸载和资源分配策略。主要研究内容总结如下:(1)在多用户多边缘服务器场景下,研究二进制任务卸载决策和资源分配问题。论文根据计算资源、任务执行时延和能耗等因子构建计算成本函数,并在相关约束条件下制定多目标联合优化模型。该优化模型是具有NP-hard性质的混合整数非线性规划问题,难以直接求解。因此,提出一种融合蚁群算法和深度确定性策略梯度的算法(Heuristic Ant Colony System-Deep Deterministic Policy Gradient,HAS-DDPG),该算法将目标函数转化为双层优化问题:上层利用融合优先排序技术和蚁群算法生成计算卸载决策、下层依据上层决策使用深度确定性策略梯度算法与优先经验回放技术结合的算法实现资源的最优分配。实验结果表明,与其它基线算法相比,HAS-DDPG算法在二进制卸载中表现更优。(2)在多边缘服务器竞争的边缘计算场景下,计算密集型应用程序的部分卸载决策问题更为复杂。该问题的特性是计算任务的输入数据量较大,并且应用程序中多个任务之间通常具有依赖关系。因此,本文首先利用有向无环图描述不同任务之间的依赖性,并根据应用程序的最晚执行时间求解任务之间的调度顺序。然后,建立用户设备的任务计算成本函数,从而构建相关约束下的多目标联合优化的模型。基于该模型建立马尔可夫决策过程,将约束条件转化为系统状态并不断进行探索。最后,利用优先经验回放技术和SAC(Soft actor-critic)算法融合的改进算法(MEC Prioritized Experience ReplaySoft actor-critic,MP-SAC)对最小化成本函数下的最优卸载决策和资源分配问题进行求解。仿真实验表明,MP-SAC算法相比于其它基线算法具有更好的稳定性和求解效率。

关键词：移动边缘计算计算卸载资源分配策略梯度算法 actor-critic

基于批量递归最小二乘的自然actor-critic算法

同方学位论文库评论

在线全文

同方学位论文库

学校读者我要写书评

暂无评论

浙江大学学报（工学版） 2015年第7期49卷 1335-1342页

作者：王国芳方舟李平浙江大学航空航天学院浙江杭州310027

为了减轻actor-critic结构中智能体用最小二乘法估计自然梯度时的在线运算负担,提高运算实时性,提出新的学习算法:NAC-BRLS.该算法在critic中利用批量递归最小二乘法估计自然梯度,根据估计得到的梯度乐观地更新策略.批量递归最小二乘法... 详细信息

为了减轻actor-critic结构中智能体用最小二乘法估计自然梯度时的在线运算负担,提高运算实时性,提出新的学习算法:NAC-BRLS.该算法在critic中利用批量递归最小二乘法估计自然梯度,根据估计得到的梯度乐观地更新策略.批量递归最小二乘法的引入使得智能体能根据自身运算能力自由调整各批次运算的数据量,即每次策略估计时使用的数据量,在全乐观和部分乐观之间进行权衡,大大提高了NAC-LSTD算法的灵活性.山地车仿真实验表明,与NAC-LSTD算法相比,NAC-BRLS算法在保证一定收敛性能的前提下,能够明显降低智能体的单步平均运算负担.

关键词：自然梯度 actor-critic 批次更新递归最小二乘