咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Development of a Bias Compensa... 收藏

Development of a Bias Compensating Q-Learning Controller for a Multi-Zone HVAC Facility

作     者:Syed Ali Asad Rizvi Amanda J.Pertzborn Zongli Lin Syed Ali Asad Rizvi;Amanda J.Pertzborn;Zongli Lin

作者机构:IEEE the Department of Electrical and Computer EngineeringTennessee Technological UniversityCookevilleTN 38505 USA the Mechanical Systems and Controls Group in the Building Energy and Environment Division at the National Institute of Standards and Technology(NIST)GaithersburgMD 20899 USA the Charles L.Brown Department of Electrical and Computer EngineeringUniversity of VirginiaCharlottesvilleVA 22904-4743 USA 

出 版 物:《IEEE/CAA Journal of Automatica Sinica》 (自动化学报(英文版))

年 卷 期:2023年第10卷第8期

页      面:1704-1715页

核心收录:

学科分类:08[工学] 081404[工学-供热、供燃气、通风及空调工程] 0835[工学-软件工程] 0802[工学-机械工程] 0814[工学-土木工程] 080201[工学-机械制造及其自动化] 

基  金:supported in part by NIST(70NANB18H161) 

主  题:HVAC control optimal tracking Q-learning reinforcement learning(RL) 

摘      要:We present the development of a bias compensating reinforcement learning(RL)algorithm that optimizes thermal comfort(by minimizing tracking error)and control utilization(by penalizing setpoint deviations)in a multi-zone heating,ventilation,and air-conditioning(HVAC)lab facility subject to unmeasurable disturbances and unknown *** is shown that the presence of unmeasurable disturbance results in an inconsistent learning equation in traditional RL controllers leading to parameter estimation bias(even with integral action support),and in the extreme case,the divergence of the learning *** demonstrate this issue by applying the popular Q-learning algorithm to linear quadratic regulation(LQR)of a multi-zone HVAC environment and showing that,even with integral support,the algorithm exhibits bias issue during the learning phase when the HVAC disturbance is unmeasurable due to unknown heat gains,occupancy variations,light sources,and outside weather *** address this difficulty,we present a bias compensating learning equation that learns a lumped bias term as a result of disturbances(and possibly other sources)in conjunction with the optimal control *** results show that the proposed scheme not only recovers the bias-free optimal control parameters but it does so without explicitly learning the dynamic model or estimating the disturbances,demonstrating the effectiveness of the algorithm in addressing the above challenges.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分