tabris.cloud

Tabris

Unity PPO

Unity ML-Agents PPO 虚拟赛车

A reinforcement-learning sandbox for training and evaluating PPO agents in reproducible Unity environments.

已上线RL environment and training loop
Problem

训练智能体看起来容易,但奖励设计、观测空间和评估指标会直接决定行为是否稳定。

Solution

把 Unity 环境、ML-Agents 配置和训练记录放在同一套实验流程里,逐步比较奖励、课程和扰动。

Result

形成了可复盘的 PPO 实验记录,适合解释强化学习项目中从现象到指标的定位方式。

Experiment shape

Unity PPO is a compact training environment for studying observation design, reward shaping, curriculum schedules, and policy stability. The project keeps environment changes and training configuration versioned together so failed runs remain explainable.

What is being measured

The current focus is not a single high score. It is a repeatable loop for comparing sample efficiency, policy collapse, recovery after perturbation, and the gap between visually plausible behavior and robust control.