【深度学习】关键技术-优化算法(Optimization Algorithms)详解与代码示例

server/2025/1/17 2:47:37/

优化算法详解与代码示例

优化算法深度学习中的关键组成部分,用于调整神经网络的权重和偏置,以最小化损失函数的值。以下是常见的优化算法及其详细介绍和代码示例:


1. 梯度下降法 (Gradient Descent)

原理:

通过计算损失函数对参数的梯度,按照梯度下降的方向更新参数。

更新公式:

\theta = \theta - \eta \cdot \nabla_\theta J(\theta)

  • \eta:学习率,控制步长大小。
  • \nabla_\theta J(\theta):损失函数对参数的梯度。
类型:
  1. 批量梯度下降 (Batch Gradient Descent)
    • 使用所有训练数据计算梯度。
    • 优点:收敛稳定。
    • 缺点:计算代价高,尤其在数据量大时。
  2. 随机梯度下降 (Stochastic Gradient Descent, SGD)
    • 使用单个样本计算梯度。
    • 优点:计算快,适用于大规模数据。
    • 缺点:更新不稳定,容易震荡。
  3. 小批量梯度下降 (Mini-Batch Gradient Descent)
    • 使用一小批样本计算梯度。
    • 优点:权衡计算效率和收敛稳定性。
代码示例:
import numpy as np# 损失函数 J(theta) = theta^2
def loss_function(theta):return theta ** 2# 损失函数的梯度
def gradient(theta):return 2 * theta# 梯度下降
def gradient_descent(initial_theta, learning_rate, epochs):theta = initial_thetafor epoch in range(epochs):grad = gradient(theta)theta = theta - learning_rate * gradprint(f"Epoch {epoch + 1}, Theta: {theta}, Loss: {loss_function(theta)}")return thetagradient_descent(initial_theta=10, learning_rate=0.1, epochs=20)
运行结果
Epoch 1, Theta: 8.0, Loss: 64.0
Epoch 2, Theta: 6.4, Loss: 40.96000000000001
Epoch 3, Theta: 5.12, Loss: 26.2144
Epoch 4, Theta: 4.096, Loss: 16.777216
Epoch 5, Theta: 3.2768, Loss: 10.73741824
Epoch 6, Theta: 2.62144, Loss: 6.871947673600001
Epoch 7, Theta: 2.0971520000000003, Loss: 4.398046511104002
Epoch 8, Theta: 1.6777216000000004, Loss: 2.8147497671065613
Epoch 9, Theta: 1.3421772800000003, Loss: 1.801439850948199
Epoch 10, Theta: 1.0737418240000003, Loss: 1.1529215046068475
Epoch 11, Theta: 0.8589934592000003, Loss: 0.7378697629483825
Epoch 12, Theta: 0.6871947673600002, Loss: 0.47223664828696477
Epoch 13, Theta: 0.5497558138880001, Loss: 0.3022314549036574
Epoch 14, Theta: 0.43980465111040007, Loss: 0.19342813113834073
Epoch 15, Theta: 0.35184372088832006, Loss: 0.12379400392853807
Epoch 16, Theta: 0.281474976710656, Loss: 0.07922816251426434
Epoch 17, Theta: 0.22517998136852482, Loss: 0.050706024009129186
Epoch 18, Theta: 0.18014398509481985, Loss: 0.03245185536584268
Epoch 19, Theta: 0.14411518807585588, Loss: 0.020769187434139313
Epoch 20, Theta: 0.11529215046068471, Loss: 0.013292279957849162


2. 动量优化 (Momentum)

原理:

在梯度下降的基础上引入动量,模拟物体的惯性,避免过早陷入局部最小值。

更新公式:

v_t = \gamma v_{t-1} + \eta \cdot \nabla_\theta J(\theta)

  • \gamma:动量因子,通常取 0.9。
代码示例:
import numpy as np# 损失函数 J(theta) = theta^2
def loss_function(theta):return theta ** 2# 损失函数的梯度
def gradient(theta):return 2 * theta
def gradient_descent_with_momentum(initial_theta, learning_rate, gamma, epochs):theta = initial_thetavelocity = 0for epoch in range(epochs):grad = gradient(theta)velocity = gamma * velocity + learning_rate * gradtheta = theta - velocityprint(f"Epoch {epoch + 1}, Theta: {theta}, Loss: {loss_function(theta)}")return thetagradient_descent_with_momentum(initial_theta=10, learning_rate=0.1, gamma=0.9, epochs=20)
运行结果:
Epoch 1, Theta: 8.0, Loss: 64.0
Epoch 2, Theta: 4.6, Loss: 21.159999999999997
Epoch 3, Theta: 0.6199999999999992, Loss: 0.384399999999999
Epoch 4, Theta: -3.0860000000000007, Loss: 9.523396000000005
Epoch 5, Theta: -5.8042, Loss: 33.68873764
Epoch 6, Theta: -7.089739999999999, Loss: 50.264413267599984
Epoch 7, Theta: -6.828777999999999, Loss: 46.63220897328399
Epoch 8, Theta: -5.228156599999998, Loss: 27.333621434123543
Epoch 9, Theta: -2.7419660199999982, Loss: 7.518377654834631
Epoch 10, Theta: 0.04399870600000133, Loss: 0.0019358861296745532
Epoch 11, Theta: 2.5425672182000008, Loss: 6.46464805906529
Epoch 12, Theta: 4.28276543554, Loss: 18.342079775856124
Epoch 13, Theta: 4.9923907440379995, Loss: 24.92396534115629
Epoch 14, Theta: 4.632575372878599, Loss: 21.460754585401293
Epoch 15, Theta: 3.382226464259419, Loss: 11.439455855536771
Epoch 16, Theta: 1.580467153650273, Loss: 2.4978764237673956
Epoch 17, Theta: -0.3572096566280132, Loss: 0.12759873878830308
Epoch 18, Theta: -2.029676854552868, Loss: 4.119588133907623
Epoch 19, Theta: -3.128961961774664, Loss: 9.790402958232752
Epoch 20, Theta: -3.4925261659193474, Loss: 12.197739019631296


3. Adagrad

原理:

根据梯度的历史信息自适应调整学习率,对学习率进行缩放,使得更新幅度与梯度大小相关。

更新公式:

\theta = \theta - \frac{\eta}{\sqrt{G_t + \epsilon}} \cdot \nabla_\theta J(\theta)

  • G_t:梯度的平方累积。
  • \epsilon:防止除零的小值。
优缺点:
  • 优点:适合稀疏数据问题。
  • 缺点:学习率会逐渐变小,导致后期收敛缓慢。
代码示例:
import numpy as np# 损失函数 J(theta) = theta^2
def loss_function(theta):return theta ** 2# 损失函数的梯度
def gradient(theta):return 2 * theta
def adagrad(initial_theta, learning_rate, epsilon, epochs):theta = initial_thetag_square_sum = 0for epoch in range(epochs):grad = gradient(theta)g_square_sum += grad ** 2adjusted_lr = learning_rate / (np.sqrt(g_square_sum) + epsilon)theta = theta - adjusted_lr * gradprint(f"Epoch {epoch + 1}, Theta: {theta}, Loss: {loss_function(theta)}")return thetaadagrad(initial_theta=10, learning_rate=0.1, epsilon=1e-8, epochs=20)
运行结果: 
Epoch 1, Theta: 9.90000000005, Loss: 98.01000000098999
Epoch 2, Theta: 9.829645540282808, Loss: 96.6219314476017
Epoch 3, Theta: 9.77237939498734, Loss: 95.49939903957312
Epoch 4, Theta: 9.722903358081876, Loss: 94.53484971059981
Epoch 5, Theta: 9.678738726594363, Loss: 93.67798333767746
Epoch 6, Theta: 9.638492461105155, Loss: 92.90053692278092
Epoch 7, Theta: 9.60129025649987, Loss: 92.18477458955935
Epoch 8, Theta: 9.566541030371654, Loss: 91.51870728578436
Epoch 9, Theta: 9.533823158916471, Loss: 90.89378402549204
Epoch 10, Theta: 9.50282343669911, Loss: 90.30365326907788
Epoch 11, Theta: 9.473301675536542, Loss: 89.74344463572345
Epoch 12, Theta: 9.44506890053656, Loss: 89.20932653588291
Epoch 13, Theta: 9.417973260913987, Loss: 88.69822034329084
Epoch 14, Theta: 9.391890561942256, Loss: 88.20760832750003
Epoch 15, Theta: 9.366717691104768, Loss: 87.73540030485503
Epoch 16, Theta: 9.342367925786823, Loss: 87.27983846077038
Epoch 17, Theta: 9.318767503595812, Loss: 86.83942778607332
Epoch 18, Theta: 9.295853063444168, Loss: 86.41288417714433
Epoch 19, Theta: 9.273569701595868, Loss: 85.99909501035687
Epoch 20, Theta: 9.251869471188973, Loss: 85.59708871191853


4. RMSprop

原理:

RMSprop 是 Adagrad 的改进版本,通过引入指数加权平均解决学习率逐渐变小的问题。

更新公式:

E[g^2]_t = \gamma E[g^2]_{t-1} + (1 - \gamma) g_t^2

\theta = \theta - \frac{\eta}{\sqrt{E[g^2]_t + \epsilon}} \cdot \nabla_\theta J(\theta)

代码示例:
import numpy as np# 损失函数 J(theta) = theta^2
def loss_function(theta):return theta ** 2# 损失函数的梯度
def gradient(theta):return 2 * theta
def rmsprop(initial_theta, learning_rate, gamma, epsilon, epochs):theta = initial_thetag_square_ema = 0for epoch in range(epochs):grad = gradient(theta)g_square_ema = gamma * g_square_ema + (1 - gamma) * grad ** 2adjusted_lr = learning_rate / (np.sqrt(g_square_ema) + epsilon)theta = theta - adjusted_lr * gradprint(f"Epoch {epoch + 1}, Theta: {theta}, Loss: {loss_function(theta)}")return thetarmsprop(initial_theta=10, learning_rate=0.1, gamma=0.9, epsilon=1e-8, epochs=20)
运行结果
Epoch 1, Theta: 9.683772234483161, Loss: 93.775444689347
Epoch 2, Theta: 9.457880248061212, Loss: 89.4514987866664
Epoch 3, Theta: 9.270530978786274, Loss: 85.94274462863599
Epoch 4, Theta: 9.105434556281987, Loss: 82.90893845873414
Epoch 5, Theta: 8.955067099353235, Loss: 80.19322675391875
Epoch 6, Theta: 8.81524826858932, Loss: 77.708602036867
Epoch 7, Theta: 8.68338298015491, Loss: 75.40113998004396
Epoch 8, Theta: 8.557735821002467, Loss: 73.23484238206876
Epoch 9, Theta: 8.437082563261683, Loss: 71.18436217929433
Epoch 10, Theta: 8.3205241519636, Loss: 69.23112216340958
Epoch 11, Theta: 8.207379341266703, Loss: 67.36107565145147
Epoch 12, Theta: 8.09711886476205, Loss: 65.56333391008548
Epoch 13, Theta: 7.989323078410318, Loss: 63.82928325121972
Epoch 14, Theta: 7.883653610798953, Loss: 62.15199425506338
Epoch 15, Theta: 7.779833754629418, Loss: 60.52581324967126
Epoch 16, Theta: 7.677634521316577, Loss: 58.94607184291202
Epoch 17, Theta: 7.5768644832322165, Loss: 57.4088753972658
Epoch 18, Theta: 7.4773622196930445, Loss: 55.910945764492894
Epoch 19, Theta: 7.378990596134174, Loss: 54.449502217836574
Epoch 20, Theta: 7.281632361364819, Loss: 53.02216984607539


5. Adam (Adaptive Moment Estimation)

原理:

结合动量和 RMSprop 的思想,同时对梯度的一阶动量和二阶动量进行估计。

更新公式:

m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t

v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2

\hat{m_t} = \frac{m_t}{1 - \beta_1^t}, \quad \hat{v_t} = \frac{v_t}{1 - \beta_2^t}

\theta = \theta - \frac{\eta}{\sqrt{\hat{v_t}} + \epsilon} \cdot \hat{m_t}

代码示例:
import numpy as np# 损失函数 J(theta) = theta^2
def loss_function(theta):return theta ** 2# 损失函数的梯度
def gradient(theta):return 2 * thetadef adam(initial_theta, learning_rate, beta1, beta2, epsilon, epochs):theta = initial_thetam, v = 0, 0for epoch in range(1, epochs + 1):grad = gradient(theta)m = beta1 * m + (1 - beta1) * gradv = beta2 * v + (1 - beta2) * grad ** 2m_hat = m / (1 - beta1 ** epoch)v_hat = v / (1 - beta2 ** epoch)theta = theta - (learning_rate / (np.sqrt(v_hat) + epsilon)) * m_hatprint(f"Epoch {epoch}, Theta: {theta}, Loss: {loss_function(theta)}")return thetaadam(initial_theta=10, learning_rate=0.1, beta1=0.9, beta2=0.999, epsilon=1e-8, epochs=20)

 运行结果:

Epoch 1, Theta: 9.90000000005, Loss: 98.01000000098999
Epoch 2, Theta: 9.800027459059471, Loss: 96.04053819831964
Epoch 3, Theta: 9.70010099242815, Loss: 94.09195926330557
Epoch 4, Theta: 9.600239395419266, Loss: 92.16459644936006
Epoch 5, Theta: 9.500461600614251, Loss: 90.2587706247459
Epoch 6, Theta: 9.40078663510384, Loss: 88.37478935874698
Epoch 7, Theta: 9.30123357774574, Loss: 86.51294606778484
Epoch 8, Theta: 9.201821516812585, Loss: 84.67351922727505
Epoch 9, Theta: 9.102569508342574, Loss: 82.85677165420798
Epoch 10, Theta: 9.003496535489624, Loss: 81.06294986457367
Epoch 11, Theta: 8.904621469150118, Loss: 79.29228350884921
Epoch 12, Theta: 8.80596303012035, Loss: 77.5449848878464
Epoch 13, Theta: 8.70753975301269, Loss: 75.82124855029629
Epoch 14, Theta: 8.60936995213032, Loss: 74.12125097264443
Epoch 15, Theta: 8.511471689470543, Loss: 72.44515032065854
Epoch 16, Theta: 8.41386274499579, Loss: 70.79308629162809
Epoch 17, Theta: 8.31656058928038, Loss: 69.16518003517162
Epoch 18, Theta: 8.219582358610113, Loss: 67.56153414997459
Epoch 19, Theta: 8.122944832581695, Loss: 65.98223275316566
Epoch 20, Theta: 8.026664414220157, Loss: 64.42734161850822


优化算法对比总结

优化算法是否自适应学习率是否结合动量是否适合稀疏数据收敛速度常见应用场景
梯度下降较慢基础优化算法
动量优化较快避免局部最小值问题
Adagrad较快稀疏特征数据
RMSprop较快深度学习,尤其是 RNN
Adam较快深度学习中的默认优化算法

以上优化算法根据任务特点和模型需求选用,能显著提高模型的训练效率和性


http://www.ppmy.cn/server/158972.html

相关文章

Browser-Use Web UI:浏览器自动化与AI的完美结合

Browser-Use Web UI:浏览器自动化与AI的完美结合 前言简介一、克隆项目二、安装与环境配置1. Python版本要求2. 安装依赖3. 安装 Playwright4. 配置环境变量(非必要步骤)三、启动 WebUI四、配置1. Agent设置2. 大模型设置3. 浏览器相关设置4. 运行 Agent结语前言 Web UI是在…

OpenStack 网络服务的插件架构

OpenStack 的网络服务具有灵活的插件架构,可支持多种不同类型的插件以满足不同的网络需求。以下是对 OpenStack 网络服务插件架构中一些常见插件类型的介绍: 一、SDN 插件 Neutron 与 SDN 的集成:在 OpenStack 网络服务里,SDN 插…

C++并发编程之跨应用程序与驱动程序的单生产者单消费者队列

设计一个单生产者单消费者队列(SPSC队列),不使用C STL库或操作系统原子操作函数,并且将其放入跨进程共享内存中以便在Ring3(用户模式)和Ring0(内核模式)之间传递数据,是一…

【绝对无坑】Mongodb获取集合的字段以及数据类型信息

Mongodb获取集合的字段以及数据类型信息 感觉很LOW的一个数据仓工具seatunel,竟然不能自动读取mongodb的表结构信息,需要手工创建。 然鹅,本人对mongodb也是新手,很多操作也不知所措,作为一个DBA,始终还是…

jupyter notebook练手项目:线性回归——学习时间与成绩的关系

线性回归——学习时间与学习成绩的关系 第1步:导入工具库 pandas——数据分析库,提供了数据结构(如DataFrame和Series)和数据操作方法,方便对数据集进行读取、清洗、转换等操作。 matplotlib——绘图库,p…

一些常见的Java面试题及其答案

Java基础 1. Java中的基本数据类型有哪些? 答案:Java中的基本数据类型包括整数类型(byte、short、int、long)、浮点类型(float、double)、字符类型(char)和布尔类型(boo…

计算机网络—地址与子网(IPv4)相关知识总结

前言 为了更加清楚的了解该相关知识,下面是发现的一些宝藏博主的博客。 彻底搞懂网络地址、广播地址、主机地址、网关、子网掩码、网络号、主机号 - lipga - 博客园 IP地址(分类)、子网掩码、网络号、主机号、子网号_网络号,主机号,子网号…

基于 Python 的财经数据接口库:AKShare

AKShare 是基于 Python 的财经数据接口库,目的是实现对股票、期货、期权、基金、外汇、债券、指数、加密货币等金融产品的基本面数据、实时和历史行情数据、衍生数据从数据采集、数据清洗到数据落地的一套工具,主要用于学术研究目的。 安装 安装手册见…