Mult-stage Phasic Policy Gradient for HoK 1v1
Authors: Yixing Chen, Songlin Jiang , Qihao Luo
Abstract
We present a reinforcement learning framework for training robust agents in 1v1 MOBA games, combining a modified Phasic Policy Gradient (PPG) algorithm with a multi-stage reward curriculum. Our method decouples policy and value learning to improve stability and sample efficiency, while a hierarchical reward schedule guides the agent from basic mechanics to strategic gameplay. Experiments show that our approach outperforms strong PPO-based baselines, achieving up to 90% win rates and demonstrating intelligent in-game behavior. Ablation studies confirm the contributions of both algorithm design and reward shaping.