用 Python 从零开始创建神经网络(六):优化(Optimization)介绍

news/2024/11/21 16:50:11/

优化(Optimization)介绍

  • 引言

引言

在随机初始化的模型中,或者即使是采用更复杂方法初始化的模型中,我们的目标是随着时间的推移培训或教育一个模型。为了训练一个模型,我们调整权重和偏差以提高模型的准确性和置信度。为此,我们需要计算模型的错误量。损失函数,也被称为成本函数,是量化模型错误程度的算法。损失是这一指标的衡量。由于损失是模型的错误,我们理想情况下希望它为0。

你可能会想知道为什么我们不根据 argmax 准确度来计算模型的错误。回想我们之前的置信度示例:[0.22, 0.6, 0.18] 对比 [0.32, 0.36, 0.32]。如果正确的类确实是中间的那一个(索引1),那么两个例子之间的模型准确性将是相同的。但是这两个例子真的像彼此那样准确吗?它们不是,因为准确性只是简单地应用一个 argmax 到输出上,以找到最大值的索引。神经网络的输出实际上是置信度,对正确答案的更多置信度是更好的。因此,我们努力增加正确的置信度并减少错误放置的置信度:

python">import matplotlib.pyplot as plt
import nnfs
from nnfs.datasets import vertical_datannfs.init()X, y = vertical_data(samples=100, classes=3)
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap='brg')
plt.show()

在这里插入图片描述

利用之前创建的代码,我们可以将这个新数据集与一个简单的神经网络结合使用:

python"># Create dataset
X, y = vertical_data(samples=100, classes=3)# Create model
dense1 = Layer_Dense(2, 3) # first dense layer, 2 inputs
activation1 = Activation_ReLU()
dense2 = Layer_Dense(3, 3) # second dense layer, 3 inputs, 3 outputs
activation2 = Activation_Softmax()# Create loss function
loss_function = Loss_CategoricalCrossentropy()

然后创建一些变量,以跟踪最佳损耗和相关的权重:

python"># Helper variables
lowest_loss = 9999999 # some initial value
best_dense1_weights = dense1.weights.copy()
best_dense1_biases = dense1.biases.copy()
best_dense2_weights = dense2.weights.copy()
best_dense2_biases = dense2.biases.copy()

我们将损失初始化为一个较大的值,当发现一个新的、较小的损失时,就会将其减小。我们还复制了权重和偏置值(copy() 可以确保完整复制,而不是引用对象)。现在,我们根据需要进行多次迭代,为权重和偏置值选择随机值,如果权重和偏置值产生的损失最小,则保存权重和偏置值:

python">for iteration in range(10000):# Generate a new set of weights for iterationdense1.weights = 0.05 * np.random.randn(2, 3)dense1.biases = 0.05 * np.random.randn(1, 3)dense2.weights = 0.05 * np.random.randn(3, 3)dense2.biases = 0.05 * np.random.randn(1, 3)# Perform a forward pass of the training data through this layerdense1.forward(X)activation1.forward(dense1.output)dense2.forward(activation1.output)activation2.forward(dense2.output)# Perform a forward pass through activation function# it takes the output of second dense layer here and returns lossloss = loss_function.calculate(activation2.output, y)# Calculate accuracy from output of activation2 and targets# calculate values along first axispredictions = np.argmax(activation2.output, axis=1)accuracy = np.mean(predictions==y)# If loss is smaller - print and save weights and biases asideif loss < lowest_loss:print('New set of weights found, iteration:', iteration, 'loss:', loss, 'acc:', accuracy)best_dense1_weights = dense1.weights.copy()best_dense1_biases = dense1.biases.copy()best_dense2_weights = dense2.weights.copy()best_dense2_biases = dense2.biases.copy()lowest_loss = loss
python">>>>
New set of weights found, iteration: 0 loss: 1.0986564 acc:
0.3333333333333333
New set of weights found, iteration: 3 loss: 1.098138 acc:
0.3333333333333333
New set of weights found, iteration: 117 loss: 1.0980115 acc:
0.3333333333333333
New set of weights found, iteration: 124 loss: 1.0977516 acc: 0.6
New set of weights found, iteration: 165 loss: 1.097571 acc:
0.3333333333333333
New set of weights found, iteration: 552 loss: 1.0974693 acc: 0.34
New set of weights found, iteration: 778 loss: 1.0968257 acc:
0.3333333333333333
New set of weights found, iteration: 4307 loss: 1.0965533 acc:
0.3333333333333333
New set of weights found, iteration: 4615 loss: 1.0964499 acc:
0.3333333333333333
New set of weights found, iteration: 9450 loss: 1.0964295 acc:
0.3333333333333333

完整代码:

python">import numpy as np
import matplotlib.pyplot as plt
import nnfs
from nnfs.datasets import vertical_datannfs.init()# Dense layer
class Layer_Dense:# Layer initializationdef __init__(self, n_inputs, n_neurons):# Initialize weights and biasesself.weights = 0.01 * np.random.randn(n_inputs, n_neurons)self.biases = np.zeros((1, n_neurons))# Forward passdef forward(self, inputs):# Calculate output values from inputs, weights and biasesself.output = np.dot(inputs, self.weights) + self.biases# ReLU activation
class Activation_ReLU:# Forward passdef forward(self, inputs):# Calculate output values from inputself.output = np.maximum(0, inputs)# Softmax activation
class Activation_Softmax:# Forward passdef forward(self, inputs):# Get unnormalized probabilitiesexp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))# Normalize them for each sampleprobabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)self.output = probabilities# Common loss class
class Loss:# Calculates the data and regularization losses# given model output and ground truth valuesdef calculate(self, output, y):# Calculate sample lossessample_losses = self.forward(output, y)# Calculate mean lossdata_loss = np.mean(sample_losses)# Return lossreturn data_loss# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):# Forward passdef forward(self, y_pred, y_true):# Number of samples in a batchsamples = len(y_pred)# Clip data to prevent division by 0# Clip both sides to not drag mean towards any valuey_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)# Probabilities for target values -# only if categorical labelsif len(y_true.shape) == 1:correct_confidences = y_pred_clipped[range(samples), y_true]# Mask values - only for one-hot encoded labelselif len(y_true.shape) == 2:correct_confidences = np.sum(y_pred_clipped * y_true, axis=1)# Lossesnegative_log_likelihoods = -np.log(correct_confidences)return negative_log_likelihoodsnnfs.init()# Create dataset
X, y = vertical_data(samples=100, classes=3)# Create model
dense1 = Layer_Dense(2, 3) # first dense layer, 2 inputs
activation1 = Activation_ReLU()
dense2 = Layer_Dense(3, 3) # second dense layer, 3 inputs, 3 outputs
activation2 = Activation_Softmax()# Create loss function
loss_function = Loss_CategoricalCrossentropy()# Helper variables
lowest_loss = 9999999 # some initial value
best_dense1_weights = dense1.weights.copy()
best_dense1_biases = dense1.biases.copy()
best_dense2_weights = dense2.weights.copy()
best_dense2_biases = dense2.biases.copy()for iteration in range(100000):# Generate a new set of weights for iterationdense1.weights = 0.05 * np.random.randn(2, 3)dense1.biases = 0.05 * np.random.randn(1, 3)dense2.weights = 0.05 * np.random.randn(3, 3)dense2.biases = 0.05 * np.random.randn(1, 3)# Perform a forward pass of the training data through this layerdense1.forward(X)activation1.forward(dense1.output)dense2.forward(activation1.output)activation2.forward(dense2.output)# Perform a forward pass through activation function# it takes the output of second dense layer here and returns lossloss = loss_function.calculate(activation2.output, y)# Calculate accuracy from output of activation2 and targets# calculate values along first axispredictions = np.argmax(activation2.output, axis=1)accuracy = np.mean(predictions==y)# If loss is smaller - print and save weights and biases asideif loss < lowest_loss:print('New set of weights found, iteration:', iteration, 'loss:', loss, 'acc:', accuracy)best_dense1_weights = dense1.weights.copy()best_dense1_biases = dense1.biases.copy()best_dense2_weights = dense2.weights.copy()best_dense2_biases = dense2.biases.copy()lowest_loss = loss

损失当然有所下降,但幅度不大。准确率也没有提高,只有一种情况例外,即模型随机找到了一组权重,从而提高了准确率。不过,在损失相当大的情况下,这种状态并不稳定。再运行 90,000 次迭代,总计 100,000 次:

python">>>>
New set of weights found, iteration: 13361 loss: 1.0963014 acc: 0.3333333333333333
New set of weights found, iteration: 14001 loss: 1.0959858 acc: 0.3333333333333333
New set of weights found, iteration: 24598 loss: 1.0947443 acc: 0.3333333333333333

损耗继续下降,但精确度没有变化。这似乎不是一种可靠的最小化损失的方法。运行 10 亿次迭代后,最佳结果(损失最小)如下:

python">>>>
New set of weights found, iteration: 229865000 loss: 1.0911305 acc:
0.3333333333333333

即使是使用这种基础数据集,我们也可以看到,随机搜索权重和偏差的组合需要的时间太长,无法成为一个可接受的方法。另一个想法可能是,不是在每次迭代中都用随机选择的值来设置参数,而是应用这些值的一部分到参数上。通过这种方法,权重将根据当前给我们带来最低损失的结果进行更新,而不是无目的地随机更新。如果调整减少了损失,我们将使其成为新的调整起点。如果由于调整而导致损失增加,那么我们将回到之前的点。使用之前类似的代码,我们将首先从随机选择权重和偏差转变为随机调整它们:

python"># Update weights with some small random values
dense1.weights += 0.05 * np.random.randn(2, 3)
dense1.biases += 0.05 * np.random.randn(1, 3)
dense2.weights += 0.05 * np.random.randn(3, 3)
dense2.biases += 0.05 * np.random.randn(1, 3)

然后,我们将把结尾的 if 语句改为:

python"># If loss is smaller - print and save weights and biases aside
if loss < lowest_loss:print('New set of weights found, iteration:', iteration, 'loss:', loss, 'acc:', accuracy)best_dense1_weights = dense1.weights.copy()best_dense1_biases = dense1.biases.copy()best_dense2_weights = dense2.weights.copy()best_dense2_biases = dense2.biases.copy()lowest_loss = loss# Revert weights and biases
else:dense1.weights = best_dense1_weights.copy()dense1.biases = best_dense1_biases.copy()dense2.weights = best_dense2_weights.copy()dense2.biases = best_dense2_biases.copy()

修改后完整代码:

python">import numpy as np
import matplotlib.pyplot as plt
import nnfs
from nnfs.datasets import vertical_data# Dense layer
class Layer_Dense:# Layer initializationdef __init__(self, n_inputs, n_neurons):# Initialize weights and biasesself.weights = 0.01 * np.random.randn(n_inputs, n_neurons)self.biases = np.zeros((1, n_neurons))# Forward passdef forward(self, inputs):# Calculate output values from inputs, weights and biasesself.output = np.dot(inputs, self.weights) + self.biases# ReLU activation
class Activation_ReLU:# Forward passdef forward(self, inputs):# Calculate output values from inputself.output = np.maximum(0, inputs)# Softmax activation
class Activation_Softmax:# Forward passdef forward(self, inputs):# Get unnormalized probabilitiesexp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))# Normalize them for each sampleprobabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)self.output = probabilities# Common loss class
class Loss:# Calculates the data and regularization losses# given model output and ground truth valuesdef calculate(self, output, y):# Calculate sample lossessample_losses = self.forward(output, y)# Calculate mean lossdata_loss = np.mean(sample_losses)# Return lossreturn data_loss# Cross-entropy loss
class Loss_CategoricalCrossentropy(Loss):# Forward passdef forward(self, y_pred, y_true):# Number of samples in a batchsamples = len(y_pred)# Clip data to prevent division by 0# Clip both sides to not drag mean towards any valuey_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)# Probabilities for target values -# only if categorical labelsif len(y_true.shape) == 1:correct_confidences = y_pred_clipped[range(samples), y_true]# Mask values - only for one-hot encoded labelselif len(y_true.shape) == 2:correct_confidences = np.sum(y_pred_clipped * y_true, axis=1)# Lossesnegative_log_likelihoods = -np.log(correct_confidences)return negative_log_likelihoodsnnfs.init()# Create dataset
X, y = vertical_data(samples=100, classes=3)# Create model
dense1 = Layer_Dense(2, 3) # first dense layer, 2 inputs
activation1 = Activation_ReLU()
dense2 = Layer_Dense(3, 3) # second dense layer, 3 inputs, 3 outputs
activation2 = Activation_Softmax()# Create loss function
loss_function = Loss_CategoricalCrossentropy()# Helper variables
lowest_loss = 9999999 # some initial value
best_dense1_weights = dense1.weights.copy()
best_dense1_biases = dense1.biases.copy()
best_dense2_weights = dense2.weights.copy()
best_dense2_biases = dense2.biases.copy()for iteration in range(10000):# Update weights with some small random valuesdense1.weights += 0.05 * np.random.randn(2, 3)dense1.biases += 0.05 * np.random.randn(1, 3)dense2.weights += 0.05 * np.random.randn(3, 3)dense2.biases += 0.05 * np.random.randn(1, 3)# Perform a forward pass of the training data through this layerdense1.forward(X)activation1.forward(dense1.output)dense2.forward(activation1.output)activation2.forward(dense2.output)# Perform a forward pass through activation function# it takes the output of second dense layer here and returns lossloss = loss_function.calculate(activation2.output, y)# Calculate accuracy from output of activation2 and targets# calculate values along first axispredictions = np.argmax(activation2.output, axis=1)accuracy = np.mean(predictions==y)# If loss is smaller - print and save weights and biases asideif loss < lowest_loss:print('New set of weights found, iteration:', iteration, 'loss:', loss, 'acc:', accuracy)best_dense1_weights = dense1.weights.copy()best_dense1_biases = dense1.biases.copy()best_dense2_weights = dense2.weights.copy()best_dense2_biases = dense2.biases.copy()lowest_loss = loss# Revert weights and biaseselse:dense1.weights = best_dense1_weights.copy()dense1.biases = best_dense1_biases.copy()dense2.weights = best_dense2_weights.copy()dense2.biases = best_dense2_biases.copy()
python">>>>
New set of weights found, iteration: 0 loss: 1.0987684 acc: 0.3333333333333333 
...
New set of weights found, iteration: 29 loss: 1.0725244 acc: 0.5266666666666666
New set of weights found, iteration: 30 loss: 1.0724432 acc: 0.3466666666666667 
...
New set of weights found, iteration: 48 loss: 1.0303522 acc: 0.6666666666666666
New set of weights found, iteration: 49 loss: 1.0292586 acc: 0.6666666666666666 
...
New set of weights found, iteration: 97 loss: 0.9277446 acc: 0.7333333333333333 
...
New set of weights found, iteration: 152 loss: 0.73390484 acc: 0.8433333333333334
New set of weights found, iteration: 156 loss: 0.7235515 acc: 0.87
New set of weights found, iteration: 160 loss: 0.7049076 acc: 0.9066666666666666 
...
New set of weights found, iteration: 7446 loss: 0.17280102 acc: 0.9333333333333333
New set of weights found, iteration: 9397 loss: 0.17279711 acc: 0.93

这次的损失下降了不少,准确率也大幅提高。应用一小部分随机值所得到的结果,我们几乎可以称之为解决方案。如果尝试 100,000 次迭代,也不会有太大进展:

python">>>>
...
New set of weights found, iteration: 14206 loss: 0.1727932 acc:
0.9333333333333333
New set of weights found, iteration: 63704 loss: 0.17278232 acc:
0.9333333333333333

让我们用之前看到的螺旋数据集来试试:

python">from nnfs.datasets import spiral_dataX, y = spiral_data(samples=100, classes=3)
python">>>>
New set of weights found, iteration: 0 loss: 1.1008677 acc: 0.3333333333333333 
...
New set of weights found, iteration: 31 loss: 1.0982264 acc: 0.37333333333333335 
...
New set of weights found, iteration: 65 loss: 1.0954362 acc: 0.38333333333333336
New set of weights found, iteration: 67 loss: 1.093989 acc: 0.4166666666666667 
...
New set of weights found, iteration: 129 loss: 1.0874122 acc: 0.42333333333333334 
...
New set of weights found, iteration: 5415 loss: 1.0790575 acc: 0.39

这次训练几乎毫无进展。损失略有减少,精确度勉强高于初始值。稍后我们会了解到,造成这种情况的最可能原因叫做损失的局部最小值。数据复杂度在这里也并非无关紧要。事实证明,难题之所以难,是有原因的,我们需要更聪明地处理这个问题。

本章的章节代码、更多资源和勘误表:https://nnfs.io/ch6


http://www.ppmy.cn/news/1548803.html

相关文章

2024年11月15日Github流行趋势

项目名称&#xff1a;MinerU 项目维护者&#xff1a;myhloli, dt-yy, Focusshang, drunkpig, papayalove等项目介绍&#xff1a;一站式开源高质量数据提取工具&#xff0c;支持从PDF、网页和多格式电子书中提取数据。项目star数&#xff1a;15,059项目fork数&#xff1a;1,105 …

Debezium系列之:Debezium3版本使用快照过程中的指标

Debezium系列之:Debezium3版本使用快照过程中的指标 一、背景二、技术原理三、增量快照四、阻塞快照指标一、背景 使用快照技术的过程中可以观察指标,从而确定快照的进度二、技术原理 Debezium系列之:Debezium 中的增量快照Debezium系列之:Incremental snapshotting设计原理…

datawhale11月组队学习 模型压缩技术2:PyTorch模型剪枝教程

文章目录 一、 prune模块简介1.1 常用方法1.2 剪枝效果1.3 二、三、四章剪枝测试总结 二、局部剪枝&#xff08;Local Pruning&#xff09;2.1 结构化剪枝2.1.1 对weight进行随机结构化剪枝&#xff08;random_structured&#xff09;2.1.2 对weight进行迭代剪枝&#xff08;范…

计算机视觉中的双边滤波:经典案例与Python代码解析

&#x1f31f; 计算机视觉中的双边滤波&#xff1a;经典案例与Python代码解析 &#x1f680; Hey小伙伴们&#xff01;今天我们要聊的是计算机视觉中的一个重要技术——双边滤波。双边滤波是一种非线性滤波方法&#xff0c;主要用于图像去噪和平滑&#xff0c;同时保留图像的边…

智能可观测护航“双十一”,电商零售高峰稳健冲刺!

近期&#xff0c;“国补”等多轮刺激性政策的推出&#xff0c;激发了2024年”双十一“电商节消费热潮&#xff0c;各电商平台成交总额再创新高。伴随电商平台业务量的显著增长&#xff0c;零售、物流、制造、银行等全产业链业务系统的访问量也急速上升。面对海量的访问请求&…

关于学习炸鸡佬智能手表 应用硬件IIC1来取代原来软件模拟的IIC

一、用完软件模拟IIC&#xff0c;虽然实现了一个通用接口&#xff0c;只要是IO口都可以用&#xff0c;但是既然引脚用到了PB6和PB7&#xff0c;这俩都是自带硬件外设IIC的&#xff0c;不用白不用。这里我也给出硬件IIC的实现&#xff0c;不得不说HAL库封装的真好 1.硬件IIC 这…

【Linux学习】【Ubuntu入门】1-8 ubuntu下压缩与解压缩

1.Linux系统下常用的压缩格式 常用的压缩扩展名&#xff1a;.tar、.tar.bz2、.tar.gz 2.Windows下7ZIP软件安装 Linux系统下很多文件是.bz2&#xff0c;.gz结尾的压缩文件。 3.Linux系统下gzip压缩工具 gzip工具负责压缩和解压缩.gz格式的压缩包。 gzip对单个文件进行…

lua脚本使用redis

1 安装工具包 luarocks install redis-lua 2 使用lua语言连接redis local MyFirstHandler {-- 插件的优先级&#xff0c;决定了插件的执行顺序&#xff1b;数字越大&#xff0c;优先级越高&#xff0c;越早执行PRIORITY 1101,-- 插件的版本号VERSION "0.1.0-1"…