基于在线强化学习的冷连轧厚度-张力协同控制优化

王平; 陈上; 吴海丹; 赵东利; 张东务; 李国栋; 孙杰

doi:10.3969/j.issn.1674-6457.2025.11.015

PDF(3032 KB)

精密成形工程 ›› 2025, Vol. 17 ›› Issue (11) : 160-169. DOI: 10.3969/j.issn.1674-6457.2025.11.015

先进材料智能成形技术

基于在线强化学习的冷连轧厚度-张力协同控制优化

王平², 陈上¹, 吴海丹², 赵东利², 张东务², 李国栋², 孙杰^1,*

作者信息 +

Intelligent Optimization of Coordinated Thickness-tension Control in Cold Tandem Rolling via Online Reinforcement Learning

WANG Ping², CHEN Shang¹, WU Haidan², ZHAO Dongli², ZHANG Dongwu², LI Guodong², SUN Jie^1,*

Author information +

文章历史 +

摘要

目的探索一种适用于冷连轧厚度-张力协同控制的智能控制方法,以提高厚度和张力的调节精度及抗扰动能力。方法利用历史工艺数据,通过空间辨识方法建立了冷连轧过程的动力学模型;基于在线A3C强化学习算法设计了厚度张力协同控制器,使其能够在与环境的持续交互中动态优化控制策略;以辊缝和辊速为控制输入、厚度和张力为输出量,构建特定奖励函数引导策略优化,并在模拟偏离正常值的初始状态下对控制器性能进行验证。结果强化学习控制器能够在厚度偏差与张力偏差存在时于6步内恢复至正常范围。与传统PID控制相比,厚度偏差控制在设定值的0.5%以内,张力偏差控制在3%以内,厚度调节精度提升约3倍,张力抗扰动能力提升约5倍。控制曲线平稳,稳态误差低,展现出较强的快速收敛和抗扰动能力。结论所提出的基于在线A3C强化学习的厚度-张力协同控制方法显著优于传统PID控制,能够实现厚度与张力的高精度、低波动调节,具有在冷连轧智能控制中应用的潜力和工程实用性。

Abstract

The work aims to explore an intelligent control method for thickness-tension coordinated control in cold tandem rolling, so as to improve regulation accuracy and disturbance rejection performance for both thickness and tension. Historical process data were used to establish a dynamic model of the cold tandem rolling process via subspace identification. Based on the online A3C reinforcement learning algorithm, a thickness-tension coordinated controller was designed to continuously optimize control policies through interaction with the environment. With roll gap and roll speed as control inputs, and with thickness and tension as outputs, a reward function was constructed to guide policy optimization, and the controller's performance was validated under initial states deviating from normal values. Experimental results showed that the reinforcement learning controller could quickly restore thickness and tension to the normal range by the 5th sampling point. Compared with the conventional PID controller, its thickness deviation was maintained within 0.5% of the setpoint, and tension deviation within 3%. The thickness regulation accuracy was improved by approximately threefold, and the tension disturbance rejection capability was improved by approximately fivefold. The control curves were smooth, steady-state errors are low, demonstrating strong fast convergence and disturbance rejection capabilities. In conclusion, the proposed online A3C reinforcement learning-based thickness-tension coordinated control method significantly outperformed conventional PID control, achieving high-precision and low-fluctuation regulation of thickness and tension, and shows promising potential for practical applications in intelligent cold tandem rolling.

导出引用

王平, 陈上, 吴海丹, 赵东利, 张东务, 李国栋, 孙杰. 基于在线强化学习的冷连轧厚度-张力协同控制优化[J]. 精密成形工程. 2025, 17(11): 160-169 https://doi.org/10.3969/j.issn.1674-6457.2025.11.015

WANG Ping, CHEN Shang, WU Haidan, ZHAO Dongli, ZHANG Dongwu, LI Guodong, SUN Jie. Intelligent Optimization of Coordinated Thickness-tension Control in Cold Tandem Rolling via Online Reinforcement Learning[J]. Journal of Netshape Forming Engineering. 2025, 17(11): 160-169 https://doi.org/10.3969/j.issn.1674-6457.2025.11.015

中图分类号： TP273+.5

参考文献

[1] 丁肇印, 丁成砚, 孙杰, 等. 基于类别特征梯度提升的冷轧带钢板形预测模型[J]. 轧钢, 2022, 39(6): 99-105.
DING Z Y, DING C Y, SUN J, et al.Prediction Model of Cold Rolled Strip Flatness Based on CatBoost[J]. Steel Rolling, 2022, 39(6): 99-105.
[2] 徐超, 吴畏, 吴勇, 等. 镍基高温合金极薄带精密轧制技术研究进展[J]. 精密成形工程, 2025, 17(7): 206-218.
XU C, WU W, WU Y, et al.Research Progress on Precision Rolling Technology of Nickel Based High-Temperature Alloy Ultra-Thin Strip[J]. Journal of Netshape Forming Engineering, 2025, 17(7): 206-218.
[3] 袁新, 徐燕. 基于模糊PID的高精度汽车电子转向自动控制系统设计[J]. 计算机测量与控制, 2025, 33(8): 129-136.
YUAN X, XU Y.Design of High-Precision Automotive Electronic Steering Automatic Control System Based on Fuzzy PID[J]. Computer Measurement & Control, 2025, 33(8): 129-136.
[4] 周建桥, 王珠, 罗雄麟. 炼化变工况下多变量过程的智能PID协调整定[J]. 化工学报, 2025, 36(1): 1-22.
ZHOU J Q, WANG Z, LUO X L.Intelligent PID Coordinated Tuning of Multivariable Processes under Variable Working Conditions in Refining and Chemical Industries[J]. CIESC Journal, 2025, 36(1): 1-22.
[5] 刘绍冲, 白国星, 黄重国, 等. 基于自适应预瞄线性MPC的差动机器人主动调速路径跟踪[J]. 北京大学学报(自然科学版), 2025, 61(1): 1-16.
LIU S C, BAI G X, HUANG Z G, et al.Path Tracking of Differential-Drive Robot with Active Speed Adjustment Based on Adaptive-Preview Linear Model Predictive Control[J]. Journal of Peking University (Natural Science Edition), 2025, 61(1): 1-16.
[6] 王迪, 郁文山, 郭存岩, 等. 基于改进MPC算法的马赫数控制器设计[J]. 电光与控制, 2025, 32(1): 1-8.
WANG D, YU W S, GUO C Y, et al.Design of Mach Number Controller Based on Improved MPC Algorithm[J]. Electronics Optics & Control, 2025, 32(1): 1-8.
[7] 黎博文, 谭泰, 李杰, 等. 基于渐进式深度强化学习的六自由度空战决策方法[J]. 计算机工程, 2025, 51(1): 1-12.
LI B W, TAN T, LI J, et al.A 6-DOF Air Combat Decision-Making Method Based on Progressive Deep Reinforcement Learning[J]. Computer Engineering, 2025, 51(1): 1-12.
[8] 秦新凯, 王然风, 付翔, 等. 基于无模型深度强化学习的煤泥浮选智能控制研究[J]. 工矿自动化, 2025, 51(8): 25-33.
QIN X K, WANG R F, FU X, et al.Research on Intelligent Control of Coal Slime Flotation Based on Model-Free Deep Reinforcement Learning[J]. Journal of Mine Automation, 2025, 51(8): 25-33.
[9] ZHOU Y J, YANG Y P, PAN B D, et al.Deep Reinforcement Learning for Real-Time Airport Emergency Evacuation Using Asynchronous Advantage Actor-Critic (A3C) Algorithm[J]. Mathematics, 2025, 13(14): 2269.
[10] ZHU W Y, LIU X L, LIU Y M, et al.RT-A3C: Real- Time Asynchronous Advantage Actor-Critic for Optimally Defending Malicious Attacks in Edge-Enabled Industrial Internet of Things[J]. Journal of Information Security and Applications, 2025, 91: 104073.
[11] YU Y, LU Q, ZHANG B T.Reinforcement Learning Based Recovery Flight Control for Flapping-Wing Micro-Aerial Vehicles under Extreme Attitudes[J]. International Journal of Advanced Robotic Systems, 2025, 22: 17298806241303290.
[12] STANDEN M, KIM J, SZABO C.Adversarial Machine Learning Attacks and Defences in Multi-Agent Reinforcement Learning[J]. ACM Computing Surveys, 2025, 57(5): 1-35.
[13] LIANG J X, MIAO H T, LI K, et al.A Review of Multi-Agent Reinforcement Learning Algorithms[J]. Electronics, 2025, 14(4): 820.
[14] ALHARTHI R, NOREEN I, KHAN A, et al.Novel Deep Reinforcement Learning Based Collision Avoidance Approach for Path Planning of Robots in Unknown Environment[J]. PLOS ONE, 2025, 20(1): e0312559.
[15] KIM W J, JEONG J, KIM T, et al.Alpha Router: Bridging the Gap Between Reinforcement Learning and Optimization for Vehicle Routing With Monte Carlo Tree Searches[J]. Entropy, 2025, 27(3): 251.
[16] ZAMPELLA M, OTAMENDI U, BELAUNZARAN X, et al.Exploring Multi-Agent Reinforcement Learning for Unrelated Parallel Machine Scheduling[J]. Journal of Supercomputing, 2025, 81(4): 521.
[17] GARRIDO-CASTANEDA S I, VASQUEZ J I, ANTONIO-CRUZ M. Coverage Path Planning Using Actor-Critic Deep Reinforcement Learning[J]. Sensors, 2025, 25(5): 1592.
[18] GU S, SEL B, DING Y, et al.Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(5): 3322-3331.
[19] JIA C, YU T, SONG Z J.Robust Reinforcement Learning with Augmented State for Leveling Control of Multi-Cylinder Hydraulic System[J]. The Journal of Supercomputing, 2024, 81(1): 185.
[20] KIM J, LEE J H.Prefrontal Meta-Control Incorporating Mental Simulation Enhances the Adaptivity of Reinforcement Learning Agents in Dynamic Environments[J]. Frontiers in Computational Neuroscience, 2025, 19: 1559915.
[21] LAUKAITIS A, ŠAREIKO A, MAŽEIKA D. Facilitating Robot Learning in Virtual Environments: A Deep Reinforcement Learning Framework[J]. Applied Sciences, 2025, 15(9): 5016.
[22] LIN C, HAN G F, WU Q L, et al.Improving Generalization in Collision Avoidance for Multiple Unmanned Aerial Vehicles via Causal Representation Learning[J]. Sensors, 2025, 25(11): 3303.
[23] ZHAO A, ZHU E L, LU R, et al.Self-Referencing Agents for Unsupervised Reinforcement Learning[J]. Neural Networks, 2025, 188: 107448.
[24] 刘惠民, 章政, 梁若雨, 等. 双轮腿机器人的子空间辨识及非线性积分滑模控制[J]. 兵工学报, 2025, 46(1): 1-12.
LIU H M, ZHANG Z, LIANG R Y, et al.Subspace Identification and Nonlinear Integral Sliding Mode Control for Two Wheeled-Legged Robot[J]. Acta Armamentarii, 2025, 46(1): 1-12.
[25] 秦灏, 戚志东, 于灵芝, 等. 基于IECSDE算法的PEMFC改进分数阶子空间辨识模型[J]. 计算机工程, 2024, 50(6): 346-357.
QIN H, QI Z D, YU L Z, et al.PEMFC Improved Fractional-Order Subspace Identification Model Based on IECSDE Algorithm[J]. Computer Engineering, 2024, 50(6): 346-357.
[26] 李旭, 曹雷, 陈方升, 等. 冷连轧机垂直振动理论研究进展与展望[J]. 轧钢, 2022, 39(5): 1-12.
LI X, CAO L, CHEN F S, et al.Review and Prospect of Theoretical Studies on Vertical Vibration in Tandem Cold Rolling Mill[J]. Steel Rolling, 2022, 39(5): 1-12.