小机器人在现实世界中学会快速驾驶

news/2024/11/23 5:48:20/

小机器人在现实世界中学会快速驾驶
—强化学习加上预训练让机器人赛车手加速前进—

Without a lifetime of experience to build on like humans have (and totally take for granted), robots that want to learn a new skill often have to start from scratch. Reinforcement learning lets robots learn new skills through trial and error but, especially in the case of end-to-end vision-based control policies, it takes a lot of time: The real world is a weirdly lit, friction-filled, obstacle-y mess that robots can’t understand without a frequently impractical amount of effort.

如果没有像人类那样终生积累的经验(而且完全认为这是理所当然的),想要学习一项新技能的机器人往往不得不从头开始。强化学习可以让机器人通过试错来学习新技能,但尤其是在端到端基于视觉的控制策略的情况下,这需要大量时间:现实世界是一个光线怪异、充满摩擦、充满障碍的混乱世界,如果不付出很多的努力,机器人就无法理解。

Roboticists at the University of California at Berkeley have vastly sped up this process by doing the same kind of cheating that humans do—instead of starting from scratch, you start with some previous experience that helps get you going. By leveraging a “foundation model” that was pretrained on robots driving themselves around, the researchers were able to get a small-scale robotic rally car to teach itself to race around indoor and outdoor tracks, matching human performance after just 20 minutes of practice.

加州大学伯克利分校的机器人学家可能已经加快了这一过程,他们做了与人类相同的行为,不是从头开始,而是从以前的一些经验开始,这有助于你继续前进。通过利用一个预先训练过的机器人驾驶的“基础模型”,研究人员能够获得一辆小型机器人拉力车,教自己在室内和室外赛道上比赛,只需20分钟的练习就可以与人类的表现相匹配。

在这里插入图片描述

That first pretraining stage happens at your leisure, by manually driving a robot (that isn’t necessarily the one that will be doing the task you care about) around different environments. The goal isn’t to teach the robot to drive fast around a course but rather the basics of not running into stuff.

第一个预训练阶段发生在你空闲的时候,通过在不同的环境中手动驾驶机器人(不一定是要完成你关心的任务的机器人)。我们的目标不是教机器人在赛道上快速行驶,而是教机器人不要碰撞到其它物体的基本知识。

With that pretrained foundation model in place, when you then move over to the little robotic rally car, it no longer has to start from scratch. Instead, you can plop it onto the course you want it to learn, drive it around once slowly to show it where you want it to go, and then let it go fully autonomous, training itself to drive faster and faster. With a low-resolution, front-facing camera and some basic state estimation, the robot attempts to reach the next checkpoint on the course as quickly as possible, leading to some interesting emergent behaviors:

有了预先训练好的基础模型,当你转向小型机器人拉力车时,它不再需要从头开始。相反,你可以把它放在你想让它学习的课程上,慢慢地开它一圈,向它展示你想让它去哪里,然后让它完全自主训练自己开得越来越快。通过低分辨率、前置摄像头和一些基本状态估计,机器人试图尽快到达球场上的下一个检查点,从而产生一些有趣的突发行为:
The system learns the concept of a “racing line,” finding a smooth path through the lap and maximizing its speed through tight corners and chicanes. The robot learns to carry its speed into the apex, then brakes sharply to turn and accelerates out of the corner, to minimize the driving duration. With a low-friction surface, the policy learns to oversteer slightly when turning, drifting into the corner to achieve fast rotation without braking during the turn. In outdoor environments, the learned policy is also able to distinguish ground characteristics, preferring smooth, high-traction areas on and around concrete paths over areas with tall grass that impedes the robot’s motion.

该系统学习了“赛道”的概念,在单圈中找到一条平滑的路径,并在急转弯和弯道中最大限度地提高速度。机器人学会将自己的速度带到顶点,然后急刹车转弯并加速出弯,以最大限度地缩短驾驶时间。在低摩擦表面的情况下,该策略学会了在转弯时轻微转向过度,在转弯过程中漂移到弯道以实现快速旋转而不制动。在户外环境中,习得的策略也能够区分地面特征,更喜欢混凝土路径上及其周围光滑、高牵引力的区域,而不是有阻碍机器人运动的高草的区域。

The other clever bit here is the reset feature, which is necessary in real-world training. When training in simulation, it’s super easy to reset a robot that fails, but outside of simulation, a failure can (by definition) end the training if the robot gets itself stuck. That’s not a big deal if you want to spend all your time minding the robot while it learns, but if you have something better to do, the robot needs to be able to train autonomously from start to finish. In this case, if the robot hasn’t moved at least 0.5 meters in the previous 3 seconds, it knows that it’s stuck, and it will execute the simple behaviors of turning randomly, backing up, and then trying to drive forward again, which gets it unstuck eventually.

这里的另一个聪明之处是重置功能,这在现实世界的训练中是必不可少的。在模拟中训练时,重置失败的机器人非常容易,但在模拟之外,如果机器人陷入困境,失败可能(根据定义)结束训练。如果你想在机器人学习的同时花所有的时间思考它,那没什么大不了的,但如果你有更好的事情要做,机器人需要能够从头到尾自主训练。在这种情况下,如果机器人在前3秒内没有移动至少0.5米,它就会知道自己被卡住了,它会执行随机转弯、倒车,然后试图再次向前行驶的简单行为,最终会被卡住。

During indoor and outdoor experiments, the robot was able to learn aggressive driving comparable to that of a human expert after just 20 minutes of autonomous practice, which the researchers say “provides strong validation that deep reinforcement learning can indeed be a viable tool for learning real-world policies even from raw images, when combined with appropriate pretraining and implemented in the context of an autonomous training framework.” It’s going to take a lot more work to implement this sort of thing safely on a larger platform, but this little car is taking the first few laps in the right direction just as quickly as it possibly can.

在室内和室外实验中,只需20分钟的自主练习,机器人就能够学会与人类专家相当的激进驾驶,研究人员表示,这“有力地验证了深度强化学习确实是一种可行的工具,即使是从原始图像中学习现实世界的政策,只要与适当的预训练相结合,并在自主训练框架的背景下实现。”要在更大的平台上安全地实现这类事情,还需要做更多的工作,但这辆小车正以最快的速度朝着正确的方向跑完前几圈。

“FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing,” by Kyle Stachowicz, Arjun Bhorkar, Dhruv Shah, Ilya Kostrikov, and Sergey Levine from UC Berkeley, is available on arXiv.

加州大学伯克利分校的Kyle Stachowicz、Arjun Bhorkar、Dhruv Shah、Ilya Kostrikov和Sergey Levine的《FastRLAP:通过深度RL和自主练习学习高速驾驶的系统》可在arXiv上获得。

北京智能佳科技有限公司

400 099 1872


http://www.ppmy.cn/news/769868.html

相关文章

十年沉淀,回头发觉我当年面试 “Android,深入分析

3、 Vlan标签怎么打 4、 线程间通信方式 锁机制:包括互斥锁、条件变量、读写锁 互斥锁提供了以排他方式防止数据结构被并发修改的方法。 读写锁允许多个线程同时读共享数据,而对写操作是互斥的。 条件变量可以以原子的方式阻塞进程,直到…

[视野] AI创业必知6大核心问题:如何选择赛道、搭配团队和应对巨头挑战

第一个问题:互联网 VS 人工智能 首先如果今天大家选择创业,我建议更应该关注人工智能,而非互联网。为什么这么讲? 1. 互联网的流量红利已经消失; 以PC来说,全球PC出货量连续5年下滑。大家知道国内最后出现…

中国AI专利数稳居第一!世界各国AI专利深度盘点

来源:智东西 摘要:深入分析AI技术在世界范围内的专利申请数据,从专利申请的角度发现AI领域发展活跃的技术。 最近两年,随着人工智能技术在国内的蓬勃发展,一些研究机构对国内外的技术现状进行了不同角度的分析&#x…

柳传志:华为采取自主研发,联想通过并购实现品牌国际化

https://www.toutiao.com/a6699197120726958605/ 最近,全球最大广告集团WPP集团与凯度发布《2019年BrandZ中国出海品牌50强》报告,国内诸多手机品牌、科技品牌上榜。华为继连续两年获得亚军之后,依赖近两年在海外市场的强劲表现,品…

HarmonyOS实战—鸿蒙OS在新能源领域发展前景

【本文正在参与“有奖征文 | HarmonyOS征文大赛”活动】 https://marketing.csdn.net/p/ad3879b53f4b8b31db27382b5fc65bbc 我的观点是肯定可以 华为也是这两年开启了鸿蒙系统的研发,其实单就操作系统,国内其实有很多,不过基本都是基于Linu…

Android历史搜索和热门标签

前言 搜索界面一直是一个APP至关重要的部分,也是用户用的最多的界面,那么历史搜索和热门标签的话,也是这个界面所需要的重要的组成部分。 本篇文章旨在帮助大家如何写好两个重要的部分。话不多说,先上图 界面 用到的控件和框架…

盘点7家芯片厂商的无人机布局

颠覆者,在上游,越是处于上游,就越是紧扼整条产业链的命脉,无人机市场也不例外。 上游芯片厂商正在逐渐往下游去渗透,相比起从下往上走,它们具有天然的优势。 此篇文章盘点了7家厂商在无人机领域的布局&…

【长文慎入】百度阿里网易大疆等大小厂前端校招面筋

关注“重度前端” 助力前端深度学习 ━━━━ 自我介绍下:某985硕士,程序媛,接触前端一年时间。从八月份开始校招面试笔试,前前后后大厂小厂也都面了挺多,不过大厂基本都被我挂完了,哭晕我,还是…