Unity PPO
Unity ML-Agents PPO 虚拟赛车
A reinforcement-learning sandbox for training and evaluating PPO agents in reproducible Unity environments.
Problem
训练智能体看起来容易,但奖励设计、观测空间和评估指标会直接决定行为是否稳定。
Solution
把 Unity 环境、ML-Agents 配置和训练记录放在同一套实验流程里,逐步比较奖励、课程和扰动。
Result
形成了可复盘的 PPO 实验记录,适合解释强化学习项目中从现象到指标的定位方式。
Experiment shape
Unity PPO is a compact training environment for studying observation design, reward shaping, curriculum schedules, and policy stability. The project keeps environment changes and training configuration versioned together so failed runs remain explainable.
What is being measured
The current focus is not a single high score. It is a repeatable loop for comparing sample efficiency, policy collapse, recovery after perturbation, and the gap between visually plausible behavior and robust control.