TY - GEN
T1 - Learning Multi-Agent Communication with Policy Fingerprints for Adaptive Traffic Signal Control
AU - Zhao, Yifan
AU - Xu, Gangyan
AU - Duy, Yali
AU - Fangz, Meng
N1 - Funding Information:
This work was supported in part by the CCF-Tencent Open Research Fund, the National Natural Science Foundation of China under Grant 71804034, and the Research Foundation of STIC under Grant JCYJ20180306171958907
Publisher Copyright:
© 2020 IEEE.
PY - 2020/8
Y1 - 2020/8
N2 - Adaptive traffic signal control is widely recognized as an effective solution to improve urban mobility and reduce congestion in metropolises. Recently, reinforcement learning has been adopted for this transportation problem. While centralized reinforcement learning inevitably faces action space explosion, decentralized reinforcement learning allows agents to develop policies based on local observations but suffers from unstable training. In this paper, we present CommNetPF, a multi-agent decentralized reinforcement learning model incorporating communication and neighbourhood policy fingerprints for adaptive traffic signal control. With policy fingerprints in communication, agents learn to produce cooperative policies and the model converges faster. Experiments in scenarios of adaptive traffic signal control show that CommNetPF outperforms several strong baselines in terms of control performance and convergence speed.
AB - Adaptive traffic signal control is widely recognized as an effective solution to improve urban mobility and reduce congestion in metropolises. Recently, reinforcement learning has been adopted for this transportation problem. While centralized reinforcement learning inevitably faces action space explosion, decentralized reinforcement learning allows agents to develop policies based on local observations but suffers from unstable training. In this paper, we present CommNetPF, a multi-agent decentralized reinforcement learning model incorporating communication and neighbourhood policy fingerprints for adaptive traffic signal control. With policy fingerprints in communication, agents learn to produce cooperative policies and the model converges faster. Experiments in scenarios of adaptive traffic signal control show that CommNetPF outperforms several strong baselines in terms of control performance and convergence speed.
KW - adaptive traffic signal control
KW - multi-agent reinforcement learning
KW - policy fingerprints
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85094143411&partnerID=8YFLogxK
U2 - 10.1109/CASE48305.2020.9216981
DO - 10.1109/CASE48305.2020.9216981
M3 - Conference article published in proceeding or book
AN - SCOPUS:85094143411
T3 - IEEE International Conference on Automation Science and Engineering
SP - 266
EP - 273
BT - 2020 IEEE 16th International Conference on Automation Science and Engineering, CASE 2020
PB - IEEE Computer Society
T2 - 16th IEEE International Conference on Automation Science and Engineering, CASE 2020
Y2 - 20 August 2020 through 21 August 2020
ER -