Unity下ML-Agents第一个示例

本文写于2025年2月12日，需要提前安装好Anaconda。按文中步骤测试了两次都可正常运行。

一、准备Python端

1.下载并解压 ML-Agents Release 22（使用git clone大概率会失败）

解压路径为 C:\Users\Administrator（Administrator为电脑用户名，你的用户名可能不一样）

2.打开Anaconda Prompt，用conda虚拟环境创建

conda create -n mlagents22 python=3.10.12
conda activate mlagents22

3.执行如下命令

python -m pip config set global.index-url https://mirrors.aliyun.com/pypi/simple //设置镜像源
cd ml-agents-release_22 //进入ml-agents-release_22文件夹
cd ml-agents-envs   //进入ml-agents-envs文件夹
pip install -e .  //注意后面的点

cd ..     //返回上层目录
cd  ml-agents
pip install -e . //注意后面的点

4.安装GPU版torch

cd ..
cd .. //回到Administrator根目录
pip3 install torch~=2.2.1 --index-url https://download.pytorch.org/whl/cu121

大概率会下载失败，Ctrl+C中止执行，把下载地址复制出来，用迅雷下载

下载后再执行下面的命令安装

pip install C:\迅雷下载\torch-2.2.2+cu121-cp310-cp310-win_amd64.whl

5.在ml-agents-release_22\config目录下建rollerball_config.yaml

behaviors:RollerBallBrain:trainer_type: ppohyperparameters:batch_size: 64buffer_size: 2048learning_rate: 0.0003beta: 0.005epsilon: 0.2lambd: 0.95num_epoch: 3network_settings:normalize: truehidden_units: 128num_layers: 2reward_signals:extrinsic:gamma: 0.99strength: 1.0max_steps: 500000time_horizon: 64summary_freq: 10000

二、准备Unity端

1.下载的 ML-Agents Release 22中有个Unity工程，位于Project文件夹

打开ProjectSettings/ProjectVersion.txt可以看到Unity版本

2.安装Unity2023.2.13f1，国内版本为Unity2023.2.13f1c1，过程略。

3.打开工程Project，如果提示找不到com.unity.ml-agents或com.unity.ml-agents.extensions。菜单Window->Package Manager，先Remove已有的ML Agents和ML Agents Extensions.

点击左上角的加号，选择Install package from disk,，选择ml-agents-release_22\com.unity.ml-agents和ml-agents-release_22\com.unity.ml-agents.extensions两个文件夹下的的package.json，完成安装

3.4新建一个场景，创建一个Plane（命名为Floor），一个方块Cube（Target），一个球Sphere（RollerAgent）（Rotation均为0,0,0，Scale均为1,1,1)

Floor位置：0，0，0

Target位置：3，0.5，0

RollerAgent：0，0.5，0

4.新建一个脚本RollerAgent

using Unity.MLAgents;
using Unity.MLAgents.Actuators;
using Unity.MLAgents.Sensors;
using UnityEngine;public class RollerAgent : Agent
{[SerializeField]private Transform Target; // 方块目标public float speed = 10; // 小球移动速度private Rigidbody rBody; // 小球刚体private void Start(){// 获取刚体组件rBody = GetComponent<Rigidbody>();}/// <summary>/// Agent重置：每次训练开始时调用/// </summary>public override void OnEpisodeBegin(){// 如果小球掉落平台，重置其位置和速度if (this.transform.position.y < 0){rBody.velocity = Vector3.zero;rBody.angularVelocity = Vector3.zero;transform.position = new Vector3(0, 0.5f, 0);}// 随机移动目标方块的位置Target.position = new Vector3(Random.value * 8 - 4, 0.5f, Random.value * 8 - 4);}/// <summary>/// 收集智能体的观察值/// </summary>/// <param name="sensor"></param>public override void CollectObservations(VectorSensor sensor){// 添加目标的位置 (3 个值：x, y, z)sensor.AddObservation(Target.position);// 添加小球的位置 (3 个值：x, y, z)sensor.AddObservation(transform.position);// 添加小球的速度 (2 个值：x, z，因为 y 方向不需要)sensor.AddObservation(rBody.velocity.x);sensor.AddObservation(rBody.velocity.z);}public override void OnActionReceived(ActionBuffers actionBuffers){// 获取动作数组：连续动作var continuousActions = actionBuffers.ContinuousActions;// 动作控制小球的移动Vector3 controlSignal = Vector3.zero;controlSignal.x = continuousActions[0]; // x 轴方向的力controlSignal.z = continuousActions[1]; // z 轴方向的力rBody.AddForce(controlSignal * speed);// 计算小球与目标的距离float distanceToTarget = Vector3.Distance(transform.position, Target.position);// 不同情况给奖励if (distanceToTarget < 1.42f){// 到达目标SetReward(1.0f);EndEpisode();}if (transform.position.y < 0){// 小球掉落EndEpisode();}} /// <summary>/// 手动测试用的动作生成逻辑（启用 Heuristic Only 时调用）/// </summary>/// <param name="actionsOut"></param>public override void Heuristic(in ActionBuffers actionsOut){var continuousActions = actionsOut.ContinuousActions;continuousActions[0] = Input.GetAxis("Horizontal"); // 左右continuousActions[1] = Input.GetAxis("Vertical");   // 前后// 调试信息Debug.Log($"Heuristic Actions: {continuousActions[0]}, {continuousActions[1]}");} }

5.RollerAgent添加四个组件Rigidbody、Behavior Parameters、Decision Requester、RollerAgent。

Behavior Parameters参数：

Behavior Name：RollerBallBrain
Space Size：8
Continuous Actions：2
Model：None(Model Asset)默认值

Decision Requester参数：

Decision Period：10

RollerAgent参数：

Target：设置为立方体

6.使用启发式训练进行测试，在RollerAgent的Behavior Parameters中将Behavior Type设置为Heuristic Only。按Play以运行场景并使用ADSW键在平台上移动球。。

但这时可能发现球动不了，在Console面板中看到如下警告信息

Fewer observations (0) made than vector observation size (8). The observations will be padded.

Heuristic method called but not implemented. Returning placeholder actions.

这与脚本的挂载顺序有关（吐槽一下：这bug让人无语），是将 RollerAgent对象的 RollerAgent组件移到Agent之前

再次按Play以运行场景，就可以通过ADSW键在平台上移动球了。

三、开始训练

1.切换到训练模式，在RollerAgent的Behavior Parameters中将Behavior Type设置为Default，先不要运行

在Anaconda Prompt中执行训练命令

mlagents-learn  config/rollerball_config.yaml --run-id=RollerBall-1

mlagents.trainers.exception.UnityTrainerException: Previous data from this run ID was found.

说明该命令正在提示，执行如下命令恢复执行

mlagents-learn  config/rollerball_config.yaml --run-id=RollerBall-1  --resume

或强制覆盖执行

mlagents-learn  config/rollerball_config.yaml --run-id=RollerBall-1  --force

当看到如下提示时，切换到Unity点击Play，看到小球自己在动，就证明开始训练了

[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.

python与Unity连接成功后是这样的

2.当Mean Reward接近1时，就可以按Ctrl+C终止执行。

将ml-agents-release_22\results\RollerBall-1\RollerBallBrain\RollerBallBrain-*.onnx复制到Unity，设置到Behavior Parameters的Model中

再次点击Unity的Play（可关闭Anaconda Prompt），就可以看到小球自动靠近小方块