TY - JOUR
T1 - Guided probabilistic reinforcement learning for sampling-efficient maintenance scheduling of multi-component system
AU - Zhang, Yiming
AU - Zhang, Dingyang
AU - Zhang, Xiaoge
AU - Qiu, Lemiao
AU - Chan, Felix T.S.
AU - Wang, Zili
AU - Zhang, Shuyou
N1 - Funding Information:
The work was supported by the National Key Research and Development Program of China ( 2022YFE0196400 ), the National Natural Science Foundation of China ( 51805123 ), and a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. PolyU 25206422).
Publisher Copyright:
© 2023 Elsevier Inc.
PY - 2023/7
Y1 - 2023/7
N2 - In recent years, multi-agent deep reinforcement learning has progressed rapidly as reflected by its increasing adoptions in industrial applications. This paper proposes a Guided Probabilistic Reinforcement Learning (Guided-PRL) model to tackle maintenance scheduling of multi-component systems in the presence of uncertainty with the goal of minimizing the overall life-cycle cost. The proposed Guided-PRL is deeply rooted in the Actor-Critic (AC) scheme. Since traditional AC falls short in sampling efficiency and suffers from getting stuck in local minima in the context of multi-agent reinforcement learning, it is thus challenging for the actor network to converge to a solution of desirable quality even when the critic network is properly configured. To address these issues, we develop a generic framework to facilitate effective training of the actor network, and the framework consists of environmental reward modeling, degradation formulation, state representation, and policy optimization. The convergence speed of the actor network is significantly improved with a guided sampling scheme for environment exploration by exploiting rules-based domain expert policies. To handle data scarcity, the environmental modeling and policy optimization are approximated with Bayesian models for effective uncertainty quantification. The Guided-PRL model is evaluated using the simulations of a 12-component system as well as GE90 and CFM56 engines. Compared with four alternative deep reinforcement learning schemes, the Guided-PRL lowers life-cycle cost by 34.92% to 88.07%. In comparison with rules-based expert policies, the Guided-PRL decreases the life-cycle cost by 23.26% to 51.36%.
AB - In recent years, multi-agent deep reinforcement learning has progressed rapidly as reflected by its increasing adoptions in industrial applications. This paper proposes a Guided Probabilistic Reinforcement Learning (Guided-PRL) model to tackle maintenance scheduling of multi-component systems in the presence of uncertainty with the goal of minimizing the overall life-cycle cost. The proposed Guided-PRL is deeply rooted in the Actor-Critic (AC) scheme. Since traditional AC falls short in sampling efficiency and suffers from getting stuck in local minima in the context of multi-agent reinforcement learning, it is thus challenging for the actor network to converge to a solution of desirable quality even when the critic network is properly configured. To address these issues, we develop a generic framework to facilitate effective training of the actor network, and the framework consists of environmental reward modeling, degradation formulation, state representation, and policy optimization. The convergence speed of the actor network is significantly improved with a guided sampling scheme for environment exploration by exploiting rules-based domain expert policies. To handle data scarcity, the environmental modeling and policy optimization are approximated with Bayesian models for effective uncertainty quantification. The Guided-PRL model is evaluated using the simulations of a 12-component system as well as GE90 and CFM56 engines. Compared with four alternative deep reinforcement learning schemes, the Guided-PRL lowers life-cycle cost by 34.92% to 88.07%. In comparison with rules-based expert policies, the Guided-PRL decreases the life-cycle cost by 23.26% to 51.36%.
KW - Deep Reinforcement Learning
KW - Maintenance Scheduling
KW - Multi-component System
KW - Probabilistic Machine Learning
KW - Sampling-Efficient Learning
UR - http://www.scopus.com/inward/record.url?scp=85150875168&partnerID=8YFLogxK
U2 - 10.1016/j.apm.2023.03.025
DO - 10.1016/j.apm.2023.03.025
M3 - Journal article
AN - SCOPUS:85150875168
SN - 0307-904X
VL - 119
SP - 677
EP - 697
JO - Applied Mathematical Modelling
JF - Applied Mathematical Modelling
ER -