Enhancing Sim-to-Real Transfer Learning with PPO and Domain Randomization

Ruichen Zhao, Yuxuan Zhang

December 2024

Abstract

This study explores the challenge of sim-to-real transfer, focusing on how discrepancies between simulated environments and real-world conditions affect agent performance. Using the CartPole environment as a test base, we examine the effects of various simulation modifications, including friction dynamics, observation noise, and curriculum learning. We employ Proximal Policy Optimization (PPO) to train policies under different conditions, comparing performance between agents trained with standard environments, domain randomization, and progressive difficulty adjustments (curriculum learning). Our experimental results show that while domain randomization improves generalization in environments with unseen variations, curriculum learning provides a smoother progression but does not always outperform direct training in harder conditions. We further evaluate the robustness of trained models by introducing unseen friction values and dynamic environmental perturbations. This exploratory work highlights the strengths and limitations of different sim-to-real strategies, providing insights into the adaptability of reinforcement learning agents under varying simulation complexities. Keywords: Sim-to-Real transfer, reinforcement learning, PPO, domain randomization, curriculum learning, simulation dynamics.

Bibtex

@inproceedings{article1,
  author    = {"Ruichen Zhao, Yuxuan Zhang"},
  title     = {Enhancing Sim-to-Real Transfer Learning with PPO and Domain Randomization},
  abstract  = {This study explores the challenge of sim-to-real
               transfer, focusing on how discrepancies between simulated environments
               and real-world conditions affect agent performance.
               Using the CartPole environment as a test base, we examine the
               effects of various simulation modifications, including friction
               dynamics, observation noise, and curriculum learning. We employ
               Proximal Policy Optimization (PPO) to train policies under
               different conditions, comparing performance between agents
               trained with standard environments, domain randomization,
               and progressive difficulty adjustments (curriculum learning).
               Our experimental results show that while domain randomization
               improves generalization in environments with unseen variations,
               curriculum learning provides a smoother progression but
               does not always outperform direct training in harder conditions.
               We further evaluate the robustness of trained models by
               introducing unseen friction values and dynamic environmental
               perturbations. This exploratory work highlights the strengths
               and limitations of different sim-to-real strategies, providing
               insights into the adaptability of reinforcement learning agents
               under varying simulation complexities.
               Keywords: Sim-to-Real transfer, reinforcement learning,
               PPO, domain randomization, curriculum learning, simulation
               dynamics.},
  year      = {2024}
}