TY - JOUR
T1 - Power Smoothing Control for Wind-Storage Integrated Systems With Hierarchical Safe Reinforcement Learning and Curriculum Learning
AU - Wang, Shuyi
AU - Zhao, Huan
AU - Shu, Ting
AU - Pan, Zibin
AU - Liang, Gaoqi
AU - Zhao, Junhua
N1 - Publisher Copyright:
© 2010-2012 IEEE.
PY - 2025/7/28
Y1 - 2025/7/28
N2 - As the penetration of wind energy increases, the Wind Storage Integrated System (WSIS) has become a critical solution to ensure stable wind power output and maximize economic benefits. However, existing data-based power smoothing control strategies are hard to satisfy the power fluctuation constraint in a complex environment, presenting an inefficient coordinate performance. To address this problem, this paper proposes a novel Hierarchical Safe Deep Reinforcement Learning (HSDRL) control framework for WSIS. The control problem is first reformulated as two interconnected Constrained Markov Decision Processes, and the hierarchical primal-dual-based safe Deep Deterministic Policy Gradient algorithm is proposed to learn the optimal policy that ensures the power output constraint. Furthermore, the curriculum learning is designed and Constraint Violation Prioritized Experience Replay method is proposed to address the unstable convergence issues caused by imbalanced constraint violation and constraint satisfaction experience data. Last, a hierarchical shared feature neural network structure is designed to share the parameters of Q networks at hierarchies and increase learning efficiency. Simulation results in WindFarmSimulator validate the efficacy of the proposed control framework, demonstrating a 15.3% improvement in profit and a 46.0% reduction in fluctuation compared to existing methods.
AB - As the penetration of wind energy increases, the Wind Storage Integrated System (WSIS) has become a critical solution to ensure stable wind power output and maximize economic benefits. However, existing data-based power smoothing control strategies are hard to satisfy the power fluctuation constraint in a complex environment, presenting an inefficient coordinate performance. To address this problem, this paper proposes a novel Hierarchical Safe Deep Reinforcement Learning (HSDRL) control framework for WSIS. The control problem is first reformulated as two interconnected Constrained Markov Decision Processes, and the hierarchical primal-dual-based safe Deep Deterministic Policy Gradient algorithm is proposed to learn the optimal policy that ensures the power output constraint. Furthermore, the curriculum learning is designed and Constraint Violation Prioritized Experience Replay method is proposed to address the unstable convergence issues caused by imbalanced constraint violation and constraint satisfaction experience data. Last, a hierarchical shared feature neural network structure is designed to share the parameters of Q networks at hierarchies and increase learning efficiency. Simulation results in WindFarmSimulator validate the efficacy of the proposed control framework, demonstrating a 15.3% improvement in profit and a 46.0% reduction in fluctuation compared to existing methods.
KW - curriculum learning
KW - deep reinforcement learning
KW - power smoothing control
KW - prioritized experiment replay
KW - Wind storage integrated systems
UR - https://www.scopus.com/pages/publications/105012117574
U2 - 10.1109/TSG.2025.3593340
DO - 10.1109/TSG.2025.3593340
M3 - Journal article
AN - SCOPUS:105012117574
SN - 1949-3053
JO - IEEE Transactions on Smart Grid
JF - IEEE Transactions on Smart Grid
ER -