针对于OpenAI Gym中FrozenLake(冻湖)环境(场景)的示例代码网上有很多,如下代码就是其中比较经典的:
import numpy as np
import gym
import random
import time
from IPython.display import clear_outputenv = gym.make("FrozenLake-v1")observation_space = env.observation_space
print("The observation space: {}".format(observation_space))
observation_space_size = env.observation_space.n
print(observation_space_size)action_space = env.action_space
print("The action space: {}".format(action_space))
action_space_size = env.action_space.n
print(action_space_size)q_table = np.zeros((observation_space_size, action_space_size))
# q_table = np.zeros([observation_space_size, action_space_size])
num_episodes = 10000
max_steps_per_episode = 100learning_rate = 0.1
discount_rate = 0.99exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.01
exploration_decay_rate = 0.01
"""total_episodes = 15000 # Total episodes 训练次数
learning_rate = 0.8 # Learning rate 学习率
max_steps = 99 # Max steps per episode 一次训练中最多决策次数
gamma = 0.95 # Discounting rate 折扣率,对未来收益的折扣# Exploration parameters
epsilon = 1.0 # Exploration rate 探索率,就是选择动作时,随机选择动作的概率
max_epsilon = 1.0 # Exploration probability at start 初始探索率
min_epsilon = 0.01 # Minimum exploration probability 最低探索率
decay_rate = 0.001 # Exponential decay rate for exploration prob 探索率消减的指数# List of rewards
rewards = []# For life or until learning is stopped
for episode in range(total_episodes):# Reset the environmentstate = env.reset()state = state[0] #本来没这条代码,但是我看这个是二元组,为了后面估计Q值可以跑,我就改成这个了,我看着是不影响的step = 0done = Falsetotal_rewards = 0for step in range(max_steps):# Choose an action a in the current world state (s)## First we randomize a numberexp_exp_tradeoff = random.uniform(0, 1)## If this number > greater than epsilon --> exploitation (taking the biggest Q value for this state)if exp_exp_tradeoff > epsilon:action = np.argmax(q_table[state,:])# Else doing a random choice --> explorationelse:action = env.action_space.sample()# Take the action (a) and observe the outcome state(s') and reward (r)new_state, reward, done, truncated, info = env.step(action) # 这个也是,刚开始报错,来后我查了新的库这个函数输出五个数,网上说最后那个加‘_’就行#new_state, reward, done, info, _ = env.step(action) # 这个也是,刚开始报错,来后我查了新的库这个函数输出五个数,网上说最后那个加‘_’就行# Update Q(s,a):= Q(s,a) + lr [R(s,a) + gamma * max Q(s',a') - Q(s,a)]# qtable[new_state,:] : all the actions we can take from new stateq_table[state, action] = q_table[state, action] + learning_rate * (reward + gamma * np.max(q_table[new_state, :]) - q_table[state, action])total_rewards += reward# Our new state is statestate = new_state# If done (if we're dead) : finish episodeif done == True: break#if truncated == True:#break# Reduce epsilon (because we need less and less exploration) 随着智能体对环境熟悉程度增加,可以减少对环境的探索epsilon = min_epsilon + (max_epsilon - min_epsilon)*np.exp(-decay_rate*episode) rewards.append(total_rewards)print ("Score over time: " + str(sum(rewards)/total_episodes))
【吴长星精选系列】用于 Q-learning 的 OpenAI Gym 和 Python - 强化学习代码项目OpenAI Gym and Python for_哔哩哔哩_bilibili
【吴长星精选系列】用 Python 训练 Q-learning Agent - 强化学习代码项目Train Q-learning Agent with Pyth_哔哩哔哩_bilibili
【吴长星精选系列】观看 Q-learning Agent Play Game with Python - Reinforcement Learning Code_哔哩哔哩_bilibili
这个系列视频中把如何基于OpenAI Gym中的FrozenLake框架编写应用代码交代得清清楚楚。
要了解底层代码,先得知道它具体在什么位置。在笔者之前的文章OpenAI Gym入门与实操(1)_蓝天居士的博客-CSDN博客
中通过pip install gym命令下载安装了OpenAI Gym,并且又通过pip install gym[all]命令安装了全部环境。安装完成后OpenAI Gym的存放路径为用户目录下的“.local/python3.xx/site-packages/gym”,笔者电脑上的实际路径即及内容如下:
