OpenAI Gym中FrozenLake环境（场景）源码分析（1）

针对于OpenAI Gym中FrozenLake（冻湖）环境（场景）的示例代码网上有很多，如下代码就是其中比较经典的：

import numpy as np
import gym
import random
import time
from IPython.display import clear_outputenv = gym.make("FrozenLake-v1")observation_space = env.observation_space
print("The observation space: {}".format(observation_space))
observation_space_size = env.observation_space.n
print(observation_space_size)action_space = env.action_space
print("The action space: {}".format(action_space))
action_space_size = env.action_space.n
print(action_space_size)q_table = np.zeros((observation_space_size, action_space_size))
# q_table = np.zeros([observation_space_size, action_space_size])
print(q_table)"""
num_episodes = 10000
max_steps_per_episode = 100learning_rate = 0.1
discount_rate = 0.99exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.01
exploration_decay_rate = 0.01
"""total_episodes = 15000        # Total episodes 训练次数
learning_rate = 0.8           # Learning rate 学习率
max_steps = 99                # Max steps per episode 一次训练中最多决策次数
gamma = 0.95                  # Discounting rate 折扣率，对未来收益的折扣# Exploration parameters
epsilon = 1.0                 # Exploration rate 探索率，就是选择动作时，随机选择动作的概率
max_epsilon = 1.0             # Exploration probability at start 初始探索率
min_epsilon = 0.01            # Minimum exploration probability 最低探索率
decay_rate = 0.001            # Exponential decay rate for exploration prob 探索率消减的指数# List of rewards
rewards = []# For life or until learning is stopped
for episode in range(total_episodes):# Reset the environmentstate = env.reset()state = state[0] #本来没这条代码，但是我看这个是二元组，为了后面估计Q值可以跑，我就改成这个了，我看着是不影响的step = 0done = Falsetotal_rewards = 0for step in range(max_steps):# Choose an action a in the current world state (s)## First we randomize a numberexp_exp_tradeoff = random.uniform(0, 1)## If this number > greater than epsilon --> exploitation (taking the biggest Q value for this state)if exp_exp_tradeoff > epsilon:action = np.argmax(q_table[state,:])# Else doing a random choice --> explorationelse:action = env.action_space.sample()# Take the action (a) and observe the outcome state(s') and reward (r)new_state, reward, done, truncated, info = env.step(action) # 这个也是，刚开始报错，来后我查了新的库这个函数输出五个数，网上说最后那个加‘_’就行#new_state, reward, done, info, _ = env.step(action) # 这个也是，刚开始报错，来后我查了新的库这个函数输出五个数，网上说最后那个加‘_’就行# Update Q(s,a):= Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]# qtable[new_state,:] : all the actions we can take from new stateq_table[state, action] = q_table[state, action] + learning_rate * (reward + gamma * np.max(q_table[new_state, :]) - q_table[state, action])total_rewards += reward# Our new state is statestate = new_state# If done (if we're dead) : finish episodeif done == True: break#if truncated == True:#break# Reduce epsilon (because we need less and less exploration) 随着智能体对环境熟悉程度增加，可以减少对环境的探索epsilon = min_epsilon + (max_epsilon - min_epsilon)*np.exp(-decay_rate*episode) rewards.append(total_rewards)print ("Score over time: " +  str(sum(rewards)/total_episodes))
print(q_table)

另外，也有很不错的讲解视频（不过是英语解说），链接如下：

【吴长星精选系列】用于 Q-learning 的 OpenAI Gym 和 Python - 强化学习代码项目OpenAI Gym and Python for_哔哩哔哩_bilibili

【吴长星精选系列】用 Python 训练 Q-learning Agent - 强化学习代码项目Train Q-learning Agent with Pyth_哔哩哔哩_bilibili

【吴长星精选系列】观看 Q-learning Agent Play Game with Python - Reinforcement Learning Code_哔哩哔哩_bilibili

这个系列视频中把如何基于OpenAI Gym中的FrozenLake框架编写应用代码交代得清清楚楚。

不论是上边的例程还是视频中的示例代码，都只是用FrozenLake库（模块）的代码，并没有深入到库的底层实现，即底层是如何实现该功能的。那么本文就来带领大家深入了解一下底层的代码实现。

要了解底层代码，先得知道它具体在什么位置。在笔者之前的文章OpenAI Gym入门与实操（1）_蓝天居士的博客-CSDN博客

中通过pip install gym命令下载安装了OpenAI Gym，并且又通过pip install gym[all]命令安装了全部环境。安装完成后OpenAI Gym的存放路径为用户目录下的“.local/python3.xx/site-packages/gym”，笔者电脑上的实际路径即及内容如下：

$ ls ~/.local/lib/python3.11/site-packages/gym
core.py  error.py     logger.py    py.typed  utils   version.py
envs     __init__.py  __pycache__  spaces    vector  wrappers

$ tree ~/.local/lib/python3.11/site-packages/gym
/home/penghao/.local/lib/python3.11/site-packages/gym
├── core.py
├── envs
│   ├── box2d
│   │   ├── bipedal_walker.py
│   │   ├── car_dynamics.py
│   │   ├── car_racing.py
│   │   ├── __init__.py
│   │   ├── lunar_lander.py
│   │   └── __pycache__
│   │       ├── bipedal_walker.cpython-311.pyc
│   │       ├── car_dynamics.cpython-311.pyc
│   │       ├── car_racing.cpython-311.pyc
│   │       ├── __init__.cpython-311.pyc
│   │       └── lunar_lander.cpython-311.pyc
│   ├── classic_control
│   │   ├── acrobot.py
│   │   ├── assets
│   │   │   └── clockwise.png
│   │   ├── cartpole.py
│   │   ├── continuous_mountain_car.py
│   │   ├── __init__.py
│   │   ├── mountain_car.py
│   │   ├── pendulum.py
│   │   ├── __pycache__
│   │   │   ├── acrobot.cpython-311.pyc
│   │   │   ├── cartpole.cpython-311.pyc
│   │   │   ├── continuous_mountain_car.cpython-311.pyc
│   │   │   ├── __init__.cpython-311.pyc
│   │   │   ├── mountain_car.cpython-311.pyc
│   │   │   ├── pendulum.cpython-311.pyc
│   │   │   └── utils.cpython-311.pyc
│   │   └── utils.py
│   ├── __init__.py
│   ├── mujoco
│   │   ├── ant.py
│   │   ├── ant_v3.py
│   │   ├── ant_v4.py
│   │   ├── assets
│   │   │   ├── ant.xml
│   │   │   ├── half_cheetah.xml
│   │   │   ├── hopper.xml
│   │   │   ├── humanoidstandup.xml
│   │   │   ├── humanoid.xml
│   │   │   ├── inverted_double_pendulum.xml
│   │   │   ├── inverted_pendulum.xml
│   │   │   ├── point.xml
│   │   │   ├── pusher.xml
│   │   │   ├── reacher.xml
│   │   │   ├── swimmer.xml
│   │   │   └── walker2d.xml
│   │   ├── half_cheetah.py
│   │   ├── half_cheetah_v3.py
│   │   ├── half_cheetah_v4.py
│   │   ├── hopper.py
│   │   ├── hopper_v3.py
│   │   ├── hopper_v4.py
│   │   ├── humanoid.py
│   │   ├── humanoidstandup.py
│   │   ├── humanoidstandup_v4.py
│   │   ├── humanoid_v3.py
│   │   ├── humanoid_v4.py
│   │   ├── __init__.py
│   │   ├── inverted_double_pendulum.py
│   │   ├── inverted_double_pendulum_v4.py
│   │   ├── inverted_pendulum.py
│   │   ├── inverted_pendulum_v4.py
│   │   ├── mujoco_env.py
│   │   ├── mujoco_rendering.py
│   │   ├── pusher.py
│   │   ├── pusher_v4.py
│   │   ├── __pycache__
│   │   │   ├── ant.cpython-311.pyc
│   │   │   ├── ant_v3.cpython-311.pyc
│   │   │   ├── ant_v4.cpython-311.pyc
│   │   │   ├── half_cheetah.cpython-311.pyc
│   │   │   ├── half_cheetah_v3.cpython-311.pyc
│   │   │   ├── half_cheetah_v4.cpython-311.pyc
│   │   │   ├── hopper.cpython-311.pyc
│   │   │   ├── hopper_v3.cpython-311.pyc
│   │   │   ├── hopper_v4.cpython-311.pyc
│   │   │   ├── humanoid.cpython-311.pyc
│   │   │   ├── humanoidstandup.cpython-311.pyc
│   │   │   ├── humanoidstandup_v4.cpython-311.pyc
│   │   │   ├── humanoid_v3.cpython-311.pyc
│   │   │   ├── humanoid_v4.cpython-311.pyc
│   │   │   ├── __init__.cpython-311.pyc
│   │   │   ├── inverted_double_pendulum.cpython-311.pyc
│   │   │   ├── inverted_double_pendulum_v4.cpython-311.pyc
│   │   │   ├── inverted_pendulum.cpython-311.pyc
│   │   │   ├── inverted_pendulum_v4.cpython-311.pyc
│   │   │   ├── mujoco_env.cpython-311.pyc
│   │   │   ├── mujoco_rendering.cpython-311.pyc
│   │   │   ├── pusher.cpython-311.pyc
│   │   │   ├── pusher_v4.cpython-311.pyc
│   │   │   ├── reacher.cpython-311.pyc
│   │   │   ├── reacher_v4.cpython-311.pyc
│   │   │   ├── swimmer.cpython-311.pyc
│   │   │   ├── swimmer_v3.cpython-311.pyc
│   │   │   ├── swimmer_v4.cpython-311.pyc
│   │   │   ├── walker2d.cpython-311.pyc
│   │   │   ├── walker2d_v3.cpython-311.pyc
│   │   │   └── walker2d_v4.cpython-311.pyc
│   │   ├── reacher.py
│   │   ├── reacher_v4.py
│   │   ├── swimmer.py
│   │   ├── swimmer_v3.py
│   │   ├── swimmer_v4.py
│   │   ├── walker2d.py
│   │   ├── walker2d_v3.py
│   │   └── walker2d_v4.py
│   ├── __pycache__
│   │   ├── __init__.cpython-311.pyc
│   │   └── registration.cpython-311.pyc
│   ├── registration.py
│   └── toy_text
│       ├── blackjack.py
│       ├── cliffwalking.py
│       ├── font
│       │   └── Minecraft.ttf
│       ├── frozen_lake.py
│       ├── img
│       │   ├── C2.png
│       │   ├── C3.png
│       │   ├── C4.png
│       │   ├── C5.png
│       │   ├── C6.png
│       │   ├── C7.png
│       │   ├── C8.png
│       │   ├── C9.png
│       │   ├── cab_front.png
│       │   ├── cab_left.png
│       │   ├── cab_rear.png
│       │   ├── cab_right.png
│       │   ├── CA.png
│       │   ├── Card.png
│       │   ├── CJ.png
│       │   ├── CK.png
│       │   ├── cookie.png
│       │   ├── CQ.png
│       │   ├── cracked_hole.png
│       │   ├── CT.png
│       │   ├── D2.png
│       │   ├── D3.png
│       │   ├── D4.png
│       │   ├── D5.png
│       │   ├── D6.png
│       │   ├── D7.png
│       │   ├── D8.png
│       │   ├── D9.png
│       │   ├── DA.png
│       │   ├── DJ.png
│       │   ├── DK.png
│       │   ├── DQ.png
│       │   ├── DT.png
│       │   ├── elf_down.png
│       │   ├── elf_left.png
│       │   ├── elf_right.png
│       │   ├── elf_up.png
│       │   ├── goal.png
│       │   ├── gridworld_median_bottom.png
│       │   ├── gridworld_median_horiz.png
│       │   ├── gridworld_median_left.png
│       │   ├── gridworld_median_right.png
│       │   ├── gridworld_median_top.png
│       │   ├── gridworld_median_vert.png
│       │   ├── H2.png
│       │   ├── H3.png
│       │   ├── H4.png
│       │   ├── H5.png
│       │   ├── H6.png
│       │   ├── H7.png
│       │   ├── H8.png
│       │   ├── H9.png
│       │   ├── HA.png
│       │   ├── HJ.png
│       │   ├── HK.png
│       │   ├── hole.png
│       │   ├── hotel.png
│       │   ├── HQ.png
│       │   ├── HT.png
│       │   ├── ice.png
│       │   ├── mountain_bg1.png
│       │   ├── mountain_bg2.png
│       │   ├── mountain_cliff.png
│       │   ├── mountain_near-cliff1.png
│       │   ├── mountain_near-cliff2.png
│       │   ├── passenger.png
│       │   ├── S2.png
│       │   ├── S3.png
│       │   ├── S4.png
│       │   ├── S5.png
│       │   ├── S6.png
│       │   ├── S7.png
│       │   ├── S8.png
│       │   ├── S9.png
│       │   ├── SA.png
│       │   ├── SJ.png
│       │   ├── SK.png
│       │   ├── SQ.png
│       │   ├── stool.png
│       │   ├── ST.png
│       │   └── taxi_background.png
│       ├── __init__.py
│       ├── __pycache__
│       │   ├── blackjack.cpython-311.pyc
│       │   ├── cliffwalking.cpython-311.pyc
│       │   ├── frozen_lake.cpython-311.pyc
│       │   ├── __init__.cpython-311.pyc
│       │   ├── taxi.cpython-311.pyc
│       │   └── utils.cpython-311.pyc
│       ├── taxi.py
│       └── utils.py
├── error.py
├── __init__.py
├── logger.py
├── __pycache__
│   ├── core.cpython-311.pyc
│   ├── error.cpython-311.pyc
│   ├── __init__.cpython-311.pyc
│   ├── logger.cpython-311.pyc
│   └── version.cpython-311.pyc
├── py.typed
├── spaces
│   ├── box.py
│   ├── dict.py
│   ├── discrete.py
│   ├── graph.py
│   ├── __init__.py
│   ├── multi_binary.py
│   ├── multi_discrete.py
│   ├── __pycache__
│   │   ├── box.cpython-311.pyc
│   │   ├── dict.cpython-311.pyc
│   │   ├── discrete.cpython-311.pyc
│   │   ├── graph.cpython-311.pyc
│   │   ├── __init__.cpython-311.pyc
│   │   ├── multi_binary.cpython-311.pyc
│   │   ├── multi_discrete.cpython-311.pyc
│   │   ├── sequence.cpython-311.pyc
│   │   ├── space.cpython-311.pyc
│   │   ├── text.cpython-311.pyc
│   │   ├── tuple.cpython-311.pyc
│   │   └── utils.cpython-311.pyc
│   ├── sequence.py
│   ├── space.py
│   ├── text.py
│   ├── tuple.py
│   └── utils.py
├── utils
│   ├── colorize.py
│   ├── env_checker.py
│   ├── ezpickle.py
│   ├── __init__.py
│   ├── passive_env_checker.py
│   ├── play.py
│   ├── __pycache__
│   │   ├── colorize.cpython-311.pyc
│   │   ├── env_checker.cpython-311.pyc
│   │   ├── ezpickle.cpython-311.pyc
│   │   ├── __init__.cpython-311.pyc
│   │   ├── passive_env_checker.cpython-311.pyc
│   │   ├── play.cpython-311.pyc
│   │   ├── save_video.cpython-311.pyc
│   │   ├── seeding.cpython-311.pyc
│   │   └── step_api_compatibility.cpython-311.pyc
│   ├── save_video.py
│   ├── seeding.py
│   └── step_api_compatibility.py
├── vector
│   ├── async_vector_env.py
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── async_vector_env.cpython-311.pyc
│   │   ├── __init__.cpython-311.pyc
│   │   ├── sync_vector_env.cpython-311.pyc
│   │   └── vector_env.cpython-311.pyc
│   ├── sync_vector_env.py
│   ├── utils
│   │   ├── __init__.py
│   │   ├── misc.py
│   │   ├── numpy_utils.py
│   │   ├── __pycache__
│   │   │   ├── __init__.cpython-311.pyc
│   │   │   ├── misc.cpython-311.pyc
│   │   │   ├── numpy_utils.cpython-311.pyc
│   │   │   ├── shared_memory.cpython-311.pyc
│   │   │   └── spaces.cpython-311.pyc
│   │   ├── shared_memory.py
│   │   └── spaces.py
│   └── vector_env.py
├── version.py
└── wrappers├── atari_preprocessing.py├── autoreset.py├── clip_action.py├── compatibility.py├── env_checker.py├── filter_observation.py├── flatten_observation.py├── frame_stack.py├── gray_scale_observation.py├── human_rendering.py├── __init__.py├── monitoring│   ├── __init__.py│   ├── __pycache__│   │   ├── __init__.cpython-311.pyc│   │   └── video_recorder.cpython-311.pyc│   └── video_recorder.py├── normalize.py├── order_enforcing.py├── pixel_observation.py├── __pycache__│   ├── atari_preprocessing.cpython-311.pyc│   ├── autoreset.cpython-311.pyc│   ├── clip_action.cpython-311.pyc│   ├── compatibility.cpython-311.pyc│   ├── env_checker.cpython-311.pyc│   ├── filter_observation.cpython-311.pyc│   ├── flatten_observation.cpython-311.pyc│   ├── frame_stack.cpython-311.pyc│   ├── gray_scale_observation.cpython-311.pyc│   ├── human_rendering.cpython-311.pyc│   ├── __init__.cpython-311.pyc│   ├── normalize.cpython-311.pyc│   ├── order_enforcing.cpython-311.pyc│   ├── pixel_observation.cpython-311.pyc│   ├── record_episode_statistics.cpython-311.pyc│   ├── record_video.cpython-311.pyc│   ├── render_collection.cpython-311.pyc│   ├── rescale_action.cpython-311.pyc│   ├── resize_observation.cpython-311.pyc│   ├── step_api_compatibility.cpython-311.pyc│   ├── time_aware_observation.cpython-311.pyc│   ├── time_limit.cpython-311.pyc│   ├── transform_observation.cpython-311.pyc│   ├── transform_reward.cpython-311.pyc│   └── vector_list_info.cpython-311.pyc├── record_episode_statistics.py├── record_video.py├── render_collection.py├── rescale_action.py├── resize_observation.py├── step_api_compatibility.py├── time_aware_observation.py├── time_limit.py├── transform_observation.py├── transform_reward.py└── vector_list_info.py27 directories, 322 files

知道代码位置只是万里长征走完了第一步，后续文章会介绍如何对代码进行调试，并随调试随讲解整个代码。

OpenAI Gym中FrozenLake环境（场景）源码分析（1）

相关文章

Java正则校验：密码必须由字母和数字组成，且大于等于8个字符。

HTML5和CSS3新特性

小米手机设置这4个按钮，升级MIUI11后！电量就能多用几小时

小米bl未解锁变砖了如何刷机_小米9如何刷机小米9稳定版卡刷开发版实操详细教程...

小米刷新率教程

miui10 android系统耗电,升级MIUI10后一晚上待机掉20%电量？帮你分析下问题所在

MIUI11Android系统耗电,小米MIUI系统升级11，网友表示很费电，学习这个省电方法够你用三天！...

miui9耗电量android os,小米9更新MIUI10之后，耗电问题怎么解决？这3个原因必须了解...