Rethinking Population-Assisted Off-policy Reinforcement Learning

Bowen Zheng, Ran Cheng

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

7 Citations (Scopus)

Abstract

While off-policy reinforcement learning (RL) algorithms are sample efficient due to gradient-based updates and data reuse in the replay buffer, they struggle with convergence to local optima due to limited exploration. On the other hand, population-based algorithms offer a natural exploration strategy, but their heuristic black-box operators are inefficient. Recent algorithms have integrated these two methods, connecting them through a shared replay buffer. However, the effect of using diverse data from population optimization iterations on off-policy RL algorithms has not been thoroughly investigated. In this paper, we first analyze the use of off-policy RL algorithms in combination with population-based algorithms, showing that the use of population data could introduce an overlooked error and harm performance. To test this, we propose a uniform and scalable training design and conduct experiments on our tailored framework in robot locomotion tasks from the OpenAI gym. Our results substantiate that using population data in off-policy RL can cause instability during training and even degrade performance. To remedy this issue, we further propose a double replay buffer design that provides more on-policy data and show its effectiveness through experiments. Our results offer practical insights for training these hybrid methods.

Original languageEnglish
Title of host publicationGECCO 2023 - Proceedings of the 2023 Genetic and Evolutionary Computation Conference
PublisherAssociation for Computing Machinery, Inc
Pages624-632
Number of pages9
ISBN (Electronic)9798400701191
DOIs
Publication statusPublished - 15 Jul 2023
Externally publishedYes
Event2023 Genetic and Evolutionary Computation Conference, GECCO 2023 - Lisbon, Portugal
Duration: 15 Jul 202319 Jul 2023

Publication series

NameGECCO 2023 - Proceedings of the 2023 Genetic and Evolutionary Computation Conference

Conference

Conference2023 Genetic and Evolutionary Computation Conference, GECCO 2023
Country/TerritoryPortugal
CityLisbon
Period15/07/2319/07/23

Keywords

  • evolutionary reinforcement learning
  • neuroevolution
  • off-policy learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'Rethinking Population-Assisted Off-policy Reinforcement Learning'. Together they form a unique fingerprint.

Cite this