【深度学习】关键技术-优化算法(Optimization Algorithms)详解与代码示例

news/2025/1/17 1:46:07/

优化算法详解与代码示例

优化算法深度学习中的关键组成部分,用于调整神经网络的权重和偏置,以最小化损失函数的值。以下是常见的优化算法及其详细介绍和代码示例:


1. 梯度下降法 (Gradient Descent)

原理:

通过计算损失函数对参数的梯度,按照梯度下降的方向更新参数。

更新公式:

\theta = \theta - \eta \cdot \nabla_\theta J(\theta)

  • \eta:学习率,控制步长大小。
  • \nabla_\theta J(\theta):损失函数对参数的梯度。
类型:
  1. 批量梯度下降 (Batch Gradient Descent)
    • 使用所有训练数据计算梯度。
    • 优点:收敛稳定。
    • 缺点:计算代价高,尤其在数据量大时。
  2. 随机梯度下降 (Stochastic Gradient Descent, SGD)
    • 使用单个样本计算梯度。
    • 优点:计算快,适用于大规模数据。
    • 缺点:更新不稳定,容易震荡。
  3. 小批量梯度下降 (Mini-Batch Gradient Descent)
    • 使用一小批样本计算梯度。
    • 优点:权衡计算效率和收敛稳定性。
代码示例:
import numpy as np# 损失函数 J(theta) = theta^2
def loss_function(theta):return theta ** 2# 损失函数的梯度
def gradient(theta):return 2 * theta# 梯度下降
def gradient_descent(initial_theta, learning_rate, epochs):theta = initial_thetafor epoch in range(epochs):grad = gradient(theta)theta = theta - learning_rate * gradprint(f"Epoch {epoch + 1}, Theta: {theta}, Loss: {loss_function(theta)}")return thetagradient_descent(initial_theta=10, learning_rate=0.1, epochs=20)
运行结果
Epoch 1, Theta: 8.0, Loss: 64.0
Epoch 2, Theta: 6.4, Loss: 40.96000000000001
Epoch 3, Theta: 5.12, Loss: 26.2144
Epoch 4, Theta: 4.096, Loss: 16.777216
Epoch 5, Theta: 3.2768, Loss: 10.73741824
Epoch 6, Theta: 2.62144, Loss: 6.871947673600001
Epoch 7, Theta: 2.0971520000000003, Loss: 4.398046511104002
Epoch 8, Theta: 1.6777216000000004, Loss: 2.8147497671065613
Epoch 9, Theta: 1.3421772800000003, Loss: 1.801439850948199
Epoch 10, Theta: 1.0737418240000003, Loss: 1.1529215046068475
Epoch 11, Theta: 0.8589934592000003, Loss: 0.7378697629483825
Epoch 12, Theta: 0.6871947673600002, Loss: 0.47223664828696477
Epoch 13, Theta: 0.5497558138880001, Loss: 0.3022314549036574
Epoch 14, Theta: 0.43980465111040007, Loss: 0.19342813113834073
Epoch 15, Theta: 0.35184372088832006, Loss: 0.12379400392853807
Epoch 16, Theta: 0.281474976710656, Loss: 0.07922816251426434
Epoch 17, Theta: 0.22517998136852482, Loss: 0.050706024009129186
Epoch 18, Theta: 0.18014398509481985, Loss: 0.03245185536584268
Epoch 19, Theta: 0.14411518807585588, Loss: 0.020769187434139313
Epoch 20, Theta: 0.11529215046068471, Loss: 0.013292279957849162


2. 动量优化 (Momentum)

原理:

在梯度下降的基础上引入动量,模拟物体的惯性,避免过早陷入局部最小值。

更新公式:

v_t = \gamma v_{t-1} + \eta \cdot \nabla_\theta J(\theta)

  • \gamma:动量因子,通常取 0.9。
代码示例:
import numpy as np# 损失函数 J(theta) = theta^2
def loss_function(theta):return theta ** 2# 损失函数的梯度
def gradient(theta):return 2 * theta
def gradient_descent_with_momentum(initial_theta, learning_rate, gamma, epochs):theta = initial_thetavelocity = 0for epoch in range(epochs):grad = gradient(theta)velocity = gamma * velocity + learning_rate * gradtheta = theta - velocityprint(f"Epoch {epoch + 1}, Theta: {theta}, Loss: {loss_function(theta)}")return thetagradient_descent_with_momentum(initial_theta=10, learning_rate=0.1, gamma=0.9, epochs=20)
运行结果:
Epoch 1, Theta: 8.0, Loss: 64.0
Epoch 2, Theta: 4.6, Loss: 21.159999999999997
Epoch 3, Theta: 0.6199999999999992, Loss: 0.384399999999999
Epoch 4, Theta: -3.0860000000000007, Loss: 9.523396000000005
Epoch 5, Theta: -5.8042, Loss: 33.68873764
Epoch 6, Theta: -7.089739999999999, Loss: 50.264413267599984
Epoch 7, Theta: -6.828777999999999, Loss: 46.63220897328399
Epoch 8, Theta: -5.228156599999998, Loss: 27.333621434123543
Epoch 9, Theta: -2.7419660199999982, Loss: 7.518377654834631
Epoch 10, Theta: 0.04399870600000133, Loss: 0.0019358861296745532
Epoch 11, Theta: 2.5425672182000008, Loss: 6.46464805906529
Epoch 12, Theta: 4.28276543554, Loss: 18.342079775856124
Epoch 13, Theta: 4.9923907440379995, Loss: 24.92396534115629
Epoch 14, Theta: 4.632575372878599, Loss: 21.460754585401293
Epoch 15, Theta: 3.382226464259419, Loss: 11.439455855536771
Epoch 16, Theta: 1.580467153650273, Loss: 2.4978764237673956
Epoch 17, Theta: -0.3572096566280132, Loss: 0.12759873878830308
Epoch 18, Theta: -2.029676854552868, Loss: 4.119588133907623
Epoch 19, Theta: -3.128961961774664, Loss: 9.790402958232752
Epoch 20, Theta: -3.4925261659193474, Loss: 12.197739019631296


3. Adagrad

原理:

根据梯度的历史信息自适应调整学习率,对学习率进行缩放,使得更新幅度与梯度大小相关。

更新公式:

\theta = \theta - \frac{\eta}{\sqrt{G_t + \epsilon}} \cdot \nabla_\theta J(\theta)

  • G_t:梯度的平方累积。
  • \epsilon:防止除零的小值。
优缺点:
  • 优点:适合稀疏数据问题。
  • 缺点:学习率会逐渐变小,导致后期收敛缓慢。
代码示例:
import numpy as np# 损失函数 J(theta) = theta^2
def loss_function(theta):return theta ** 2# 损失函数的梯度
def gradient(theta):return 2 * theta
def adagrad(initial_theta, learning_rate, epsilon, epochs):theta = initial_thetag_square_sum = 0for epoch in range(epochs):grad = gradient(theta)g_square_sum += grad ** 2adjusted_lr = learning_rate / (np.sqrt(g_square_sum) + epsilon)theta = theta - adjusted_lr * gradprint(f"Epoch {epoch + 1}, Theta: {theta}, Loss: {loss_function(theta)}")return thetaadagrad(initial_theta=10, learning_rate=0.1, epsilon=1e-8, epochs=20)
运行结果: 
Epoch 1, Theta: 9.90000000005, Loss: 98.01000000098999
Epoch 2, Theta: 9.829645540282808, Loss: 96.6219314476017
Epoch 3, Theta: 9.77237939498734, Loss: 95.49939903957312
Epoch 4, Theta: 9.722903358081876, Loss: 94.53484971059981
Epoch 5, Theta: 9.678738726594363, Loss: 93.67798333767746
Epoch 6, Theta: 9.638492461105155, Loss: 92.90053692278092
Epoch 7, Theta: 9.60129025649987, Loss: 92.18477458955935
Epoch 8, Theta: 9.566541030371654, Loss: 91.51870728578436
Epoch 9, Theta: 9.533823158916471, Loss: 90.89378402549204
Epoch 10, Theta: 9.50282343669911, Loss: 90.30365326907788
Epoch 11, Theta: 9.473301675536542, Loss: 89.74344463572345
Epoch 12, Theta: 9.44506890053656, Loss: 89.20932653588291
Epoch 13, Theta: 9.417973260913987, Loss: 88.69822034329084
Epoch 14, Theta: 9.391890561942256, Loss: 88.20760832750003
Epoch 15, Theta: 9.366717691104768, Loss: 87.73540030485503
Epoch 16, Theta: 9.342367925786823, Loss: 87.27983846077038
Epoch 17, Theta: 9.318767503595812, Loss: 86.83942778607332
Epoch 18, Theta: 9.295853063444168, Loss: 86.41288417714433
Epoch 19, Theta: 9.273569701595868, Loss: 85.99909501035687
Epoch 20, Theta: 9.251869471188973, Loss: 85.59708871191853


4. RMSprop

原理:

RMSprop 是 Adagrad 的改进版本,通过引入指数加权平均解决学习率逐渐变小的问题。

更新公式:

E[g^2]_t = \gamma E[g^2]_{t-1} + (1 - \gamma) g_t^2

\theta = \theta - \frac{\eta}{\sqrt{E[g^2]_t + \epsilon}} \cdot \nabla_\theta J(\theta)

代码示例:
import numpy as np# 损失函数 J(theta) = theta^2
def loss_function(theta):return theta ** 2# 损失函数的梯度
def gradient(theta):return 2 * theta
def rmsprop(initial_theta, learning_rate, gamma, epsilon, epochs):theta = initial_thetag_square_ema = 0for epoch in range(epochs):grad = gradient(theta)g_square_ema = gamma * g_square_ema + (1 - gamma) * grad ** 2adjusted_lr = learning_rate / (np.sqrt(g_square_ema) + epsilon)theta = theta - adjusted_lr * gradprint(f"Epoch {epoch + 1}, Theta: {theta}, Loss: {loss_function(theta)}")return thetarmsprop(initial_theta=10, learning_rate=0.1, gamma=0.9, epsilon=1e-8, epochs=20)
运行结果
Epoch 1, Theta: 9.683772234483161, Loss: 93.775444689347
Epoch 2, Theta: 9.457880248061212, Loss: 89.4514987866664
Epoch 3, Theta: 9.270530978786274, Loss: 85.94274462863599
Epoch 4, Theta: 9.105434556281987, Loss: 82.90893845873414
Epoch 5, Theta: 8.955067099353235, Loss: 80.19322675391875
Epoch 6, Theta: 8.81524826858932, Loss: 77.708602036867
Epoch 7, Theta: 8.68338298015491, Loss: 75.40113998004396
Epoch 8, Theta: 8.557735821002467, Loss: 73.23484238206876
Epoch 9, Theta: 8.437082563261683, Loss: 71.18436217929433
Epoch 10, Theta: 8.3205241519636, Loss: 69.23112216340958
Epoch 11, Theta: 8.207379341266703, Loss: 67.36107565145147
Epoch 12, Theta: 8.09711886476205, Loss: 65.56333391008548
Epoch 13, Theta: 7.989323078410318, Loss: 63.82928325121972
Epoch 14, Theta: 7.883653610798953, Loss: 62.15199425506338
Epoch 15, Theta: 7.779833754629418, Loss: 60.52581324967126
Epoch 16, Theta: 7.677634521316577, Loss: 58.94607184291202
Epoch 17, Theta: 7.5768644832322165, Loss: 57.4088753972658
Epoch 18, Theta: 7.4773622196930445, Loss: 55.910945764492894
Epoch 19, Theta: 7.378990596134174, Loss: 54.449502217836574
Epoch 20, Theta: 7.281632361364819, Loss: 53.02216984607539


5. Adam (Adaptive Moment Estimation)

原理:

结合动量和 RMSprop 的思想,同时对梯度的一阶动量和二阶动量进行估计。

更新公式:

m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t

v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2

\hat{m_t} = \frac{m_t}{1 - \beta_1^t}, \quad \hat{v_t} = \frac{v_t}{1 - \beta_2^t}

\theta = \theta - \frac{\eta}{\sqrt{\hat{v_t}} + \epsilon} \cdot \hat{m_t}

代码示例:
import numpy as np# 损失函数 J(theta) = theta^2
def loss_function(theta):return theta ** 2# 损失函数的梯度
def gradient(theta):return 2 * thetadef adam(initial_theta, learning_rate, beta1, beta2, epsilon, epochs):theta = initial_thetam, v = 0, 0for epoch in range(1, epochs + 1):grad = gradient(theta)m = beta1 * m + (1 - beta1) * gradv = beta2 * v + (1 - beta2) * grad ** 2m_hat = m / (1 - beta1 ** epoch)v_hat = v / (1 - beta2 ** epoch)theta = theta - (learning_rate / (np.sqrt(v_hat) + epsilon)) * m_hatprint(f"Epoch {epoch}, Theta: {theta}, Loss: {loss_function(theta)}")return thetaadam(initial_theta=10, learning_rate=0.1, beta1=0.9, beta2=0.999, epsilon=1e-8, epochs=20)

 运行结果:

Epoch 1, Theta: 9.90000000005, Loss: 98.01000000098999
Epoch 2, Theta: 9.800027459059471, Loss: 96.04053819831964
Epoch 3, Theta: 9.70010099242815, Loss: 94.09195926330557
Epoch 4, Theta: 9.600239395419266, Loss: 92.16459644936006
Epoch 5, Theta: 9.500461600614251, Loss: 90.2587706247459
Epoch 6, Theta: 9.40078663510384, Loss: 88.37478935874698
Epoch 7, Theta: 9.30123357774574, Loss: 86.51294606778484
Epoch 8, Theta: 9.201821516812585, Loss: 84.67351922727505
Epoch 9, Theta: 9.102569508342574, Loss: 82.85677165420798
Epoch 10, Theta: 9.003496535489624, Loss: 81.06294986457367
Epoch 11, Theta: 8.904621469150118, Loss: 79.29228350884921
Epoch 12, Theta: 8.80596303012035, Loss: 77.5449848878464
Epoch 13, Theta: 8.70753975301269, Loss: 75.82124855029629
Epoch 14, Theta: 8.60936995213032, Loss: 74.12125097264443
Epoch 15, Theta: 8.511471689470543, Loss: 72.44515032065854
Epoch 16, Theta: 8.41386274499579, Loss: 70.79308629162809
Epoch 17, Theta: 8.31656058928038, Loss: 69.16518003517162
Epoch 18, Theta: 8.219582358610113, Loss: 67.56153414997459
Epoch 19, Theta: 8.122944832581695, Loss: 65.98223275316566
Epoch 20, Theta: 8.026664414220157, Loss: 64.42734161850822


优化算法对比总结

优化算法是否自适应学习率是否结合动量是否适合稀疏数据收敛速度常见应用场景
梯度下降较慢基础优化算法
动量优化较快避免局部最小值问题
Adagrad较快稀疏特征数据
RMSprop较快深度学习,尤其是 RNN
Adam较快深度学习中的默认优化算法

以上优化算法根据任务特点和模型需求选用,能显著提高模型的训练效率和性


http://www.ppmy.cn/news/1563758.html

相关文章

python中的RPA->playwright自动化录制脚本实战案例笔记

playwright录制功能使用绕过登录操作 1、首先安装playwright pip install playwright2、 安装支持的浏览器 playwright install # 安装支持的浏览器:cr, chromium, ff, firefox, wk 和 webkit3、接着在自己的项目下运行录制命令: playwright codegen…

MyBatis 注解使用指南

什么是 MyBatis 注解? MyBatis 是一个老牌而强大的 ORM 框架,通过配置文件或注解来实现数据库操作。在注解模式中,你可以不用写 XML 配置文件,而是通过在代码中直接使用 Java 注解来实现。这种方式更简洁,会让你的项目…

Android 通过systrace如何快速找到app的刷新率

1. 如何抓取systrace: 方法一 andrdoid11以及以上的android版本都支持使用perfetto的方式抓取systrace,简单好用。 adb shell perfetto --buffer 512mb --time 10s --out /data/misc/perfetto-traces/perfetto_trace gfx input view wm am hal res dalv…

MPLS原理及配置

赶时间可以只看实验部分 由来:90年代中期,互联网流量的快速增长。传统IP报文依赖路由器查询路由表转发,但由于硬件技术存在限制导致转发性能低,查表转发成为了网络数据转发的瓶颈。 因此,旨在提高路由器转发速度的MPL…

解除WPS登录限制

忽然发现风景依旧,人已非少年。 原地徘徊一千步,抵不上向前迈出第一步; 心中想过多数次,不如撸起袖子干一次。 当你迈出第一步,才知道其次步并不难。 当你想飞的时候不要放弃飞, 当你有梦的时候不要放弃梦。…

C/C++中,const、static关键字有什么作用,如何定义、初始化,什么情形下需要用到这两关键字?

在C和C编程中,const和static是两个非常重要的关键字,它们各自有独特的作用和使用场景。下面分别介绍这两个关键字的作用、定义和初始化方法,以及何时需要使用它们。 const 关键字 作用 const关键字用于声明一个变量为常量,即该…

流批一体计算引擎-18-离线和实时缝合成的流批一体缘何成为主流

文章目录 1 背景2 为什么提出流批一体2.1 Lambda架构2.2 Kappa架构2.3 技术实现2.4 流批一体的价值3 市面上的流批一体3.1 离线和实时的缝合怪3.2 缝合怪为何成为主流3.3 技术发展的必然趋势4 数据采集和流批一体化4.1 流批一体化的核心优势4.2 数据清洗转换4.3 数据指标统计计…

密码机服务器在云计算中的应用与挑战

随着云计算技术的迅猛发展和普及,密码机服务器作为一种高效、专业的数据安全解决方案,正在云计算领域中扮演着越来越重要的角色。本文将探讨密码机服务器在云计算中的应用及其面临的挑战。 云计算技术涉及大量的数据传输和存储,数据的安全性和…