AlexNet，LeNet-5，ResNet，VGG-19，VGG-16模型

模型

- AlexNet
- - - 导入必要的库：
    - 加载类别名称：
    - 创建标签映射字典：
    - 加载图像数据和对应的标签：
    - 构建AlexNet模型：
    - 编译模型：
    - 训练模型：
- LeNet-5
- - - 导入必要的库：
    - 加载类别名称：
    - 创建标签映射字典：
    - 加载图像数据和对应的标签：
    - 构建LeNet模型：
    - 编译模型：
- ResNet
- - - 导入必要的库：
    - 加载类别名称：
    - 创建标签映射字典：
    - 加载图像数据和对应的标签：
    - 使用ResNet50模型进行迁移学习
    - 冻结预训练模型的权重：
    - 编译模型：
    - 训练模型：
- VGG-19
- - - 导入必要的库：
    - 加载类别名称
    - 创建标签映射字典：
    - 加载图像数据和对应的标签：
    - 使用VGG-19模型进行迁移学习
    - 冻结预训练模型的权重：
    - 编译模型：
    - 训练模型：
- VGG-16
- - - 导入必要的库：
    - 加载类别名称
    - 创建标签映射字典：
    - 加载图像数据和对应的标签：
    - 使用VGG-16模型进行迁移学习
    - 冻结预训练模型的权重：
    - 编译模型：
    - 训练模型：

本篇博客的图像和标签数据集就是之前自己训练的
博客地址：
https://blog.csdn.net/2301_76794217/article/details/139215356?spm=1001.2014.3001.5502
数据集地址：
https://download.csdn.net/download/2301_76794217/89359353?spm=1001.2101.3001.9500

AlexNet

提出时间：2012年
主要贡献：AlexNet是第一个在ImageNet竞赛中取得显著成绩的深度卷积神经网络，它引入了许多后来被广泛采用的技术，如局部响应归一化（LRN）、ReLU激活函数、使用GPU进行并行计算等。
结构特点：AlexNet包含5个卷积层、3个全连接层和两个用于防止过拟合的dropout层。它使用了大量的数据增强技术，包括随机裁剪、随机缩放、随机水平翻转等。
大小：大约2500万个参数层数：5层卷积层，3层全连接层特点：使用ReLU激活函数、LRN层、重叠池化、多尺度特征提取等

导入必要的库：

os：用于操作文件和目录。
cv2：OpenCV库，用于图像处理。
numpy：用于数值计算。
tensorflow：用于构建和训练深度学习模型。
tensorflow.keras.layers：Keras层，用于构建神经网络。
tensorflow.keras.models：Keras模型，用于创建模型架构。

python">import os
import cv2
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, ZeroPadding2D
from tensorflow.keras.models import Sequential

加载类别名称：

打开classes.txt文件，读取其中的类别名称，并存储在classes列表中。

python">with open('99/classes.txt', 'r') as f:classes = f.read().splitlines()

创建标签映射字典：

label_mapping字典将数字标签映射到对应的情绪名称。

python">label_mapping = {'0': 'sad','1': 'happy','2': 'amazed','3': 'anger'
}

加载图像数据和对应的标签：

遍历image_folder中的所有图像文件。
读取图像，将其调整为AlexNet所需的输入大小（227x227像素），并将其添加到X_train列表。
读取与图像对应的标签文件（.txt格式），提取标签索引，将其映射到情绪名称，然后转换为数字标签，并添加到y_train列表。
将X_train和y_train转换为NumPy数组，并对图像数据进行归一化处理。

python">image_folder = '561'
label_folder = '99'X_train = []
y_train = []for image_file in os.listdir(image_folder):image_path = os.path.join(image_folder, image_file)image = cv2.imread(image_path)if image is not None:image = cv2.resize(image, (227, 227))  # AlexNet输入图像大小为227x227X_train.append(image)label_file = os.path.join(label_folder, image_file.replace('.jpg', '.txt'))with open(label_file, 'r') as f:label_index = f.readline().strip().split()[0]  # 只取第一个数字作为标签索引label_name = label_mapping[label_index]label = classes.index(label_name)y_train.append(label)X_train = np.array(X_train) / 255.0  # 归一化图像数据
y_train = np.array(y_train)

构建AlexNet模型：

使用Keras的Sequential模型，按照AlexNet的结构添加层。
模型包括卷积层（Conv2D）、最大池化层（MaxPooling2D）、全连接层（Dense）、dropout层（Dropout）等。
输出层使用softmax激活函数，因为这是一个多分类问题。

python">model = Sequential()
model.add(ZeroPadding2D((1, 1), input_shape=(227, 227, 3)))
model.add(Conv2D(96, (11, 11), strides=(4, 4), activation='relu'))
model.add(MaxPooling2D((3, 3), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(256, (5, 5), activation='relu'))
model.add(MaxPooling2D((3, 3), strides=(2, 2)))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(384, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(384, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1, 1)))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D((3, 3), strides=(2, 2)))
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(classes), activation='softmax'))

编译模型：

使用Adam优化器、稀疏分类交叉熵损失函数（因为标签是整数），并监控准确率。

python">model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

训练模型：

使用fit方法训练模型，指定训练数据、迭代次数（epochs）、批量大小（batch_size）。

python">model.fit(X_train, y_train, epochs=20, batch_size=32)

在这里插入图片描述

LeNet-5

提出时间：1998年主要贡献：LeNet-5是由Yann
LeCun提出的，它是第一个在实际中使用的卷积神经网络，主要用于手写数字识别。LeNet-5的设计非常简单，但它在当时的技术条件下取得了很好的效果。
结构特点：LeNet-5包含两个卷积层、两个池化层、三个全连接层和两个输出层。它的设计非常紧凑，只包含6万个参数，这使得它可以在当时的硬件条件下运行。
大小：大约6万个参数层数：2层卷积层，3层全连接层特点：非常早期的卷积神经网络，用于手写数字识别，结构简单，参数少。

导入必要的库：

os：用于操作文件和目录。
cv2：OpenCV库，用于图像处理。
numpy：用于数值计算。
tensorflow：用于构建和训练深度学习模型。

python">import os
import cv2
import numpy as np
import tensorflow as tf

加载类别名称：

打开classes.txt文件，读取其中的类别名称，并存储在classes列表中。

python">with open('99/classes.txt', 'r') as f:classes = f.read().splitlines()

创建标签映射字典：

label_mapping字典将数字标签映射到对应的情绪名称。

python">label_mapping = {'0': 'sad','1': 'happy','2': 'amazed','3': 'anger'
}

加载图像数据和对应的标签：

构建LeNet模型：遍历image_folder中的所有图像文件。
读取图像，并将其添加到X_train列表。
读取与图像对应的标签文件（.txt格式），提取标签索引，将其映射到情绪名称，然后转换为数字标签，并添加到y_train列表。
将X_train和y_train转换为NumPy数组。

python">#加载图像数据和对应的标签
image_folder = '561'
label_folder = '99'X_train = []
y_train = []for image_file in os.listdir(image_folder):image_path = os.path.join(image_folder, image_file)image = cv2.imread(image_path)if image is not None:X_train.append(image)label_file = os.path.join(label_folder, image_file.replace('.jpg', '.txt'))with open(label_file, 'r') as f:label_index = f.readline().strip().split()[0]  # 只取第一个数字作为标签索引label_name = label_mapping[label_index]label = classes.index(label_name)y_train.append(label)X_train = np.array(X_train)
y_train = np.array(y_train)

构建LeNet模型：

使用Keras的Sequential模型，按照LeNet的结构添加层。
模型包括卷积层（Conv2D）、最大池化层（MaxPooling2D）、全连接层（Dense）等。
输出层使用softmax激活函数，因为这是一个多分类问题。

python">model = tf.keras.Sequential([tf.keras.layers.Conv2D(6, (5, 5), activation='relu', input_shape=(X_train.shape[1:])),tf.keras.layers.MaxPooling2D((2, 2)),tf.keras.layers.Conv2D(16, (5, 5), activation='relu'),tf.keras.layers.MaxPooling2D((2, 2)),tf.keras.layers.Flatten(),tf.keras.layers.Dense(120, activation='relu'),tf.keras.layers.Dense(84, activation='relu'),tf.keras.layers.Dense(len(classes), activation='softmax')
])

编译模型：

使用Adam优化器、稀疏分类交叉熵损失函数（因为标签是整数），并监控准确率。

python">model.fit(X_train, y_train, epochs=20, batch_size=32)

在这里插入图片描述

ResNet

提出时间：2015年主要贡献：ResNet是第一个引入残差学习的深度网络，它通过引入残差块（Residual
Block）来解决深层网络训练困难的问题，这些残差块包含一个恒等连接，使得网络可以更稳定地训练更深层次的网络。
结构特点：ResNet包含多个残差块，每个残差块包含一个或多个卷积层，这些卷积层可以有不同的大小和数量。ResNet的层数通常比其他网络更深，如ResNet-50、ResNet-101、ResNet-152等。

大小：大约2500万个参数层数：50层，包括17个残差块特点：引入了残差学习，可以构建更深层次的网络，解决梯度消失和梯度爆炸问题。

导入必要的库：

os：用于操作文件和目录。
cv2：OpenCV库，用于图像处理。
numpy：用于数值计算。
tensorflow：用于构建和训练深度学习模型。
tensorflow.keras.applications：包含预训练的模型，如ResNet50。
tensorflow.keras.layers：Keras层，用于构建神经网络。
tensorflow.keras.models：Keras模型，用于创建模型架构。

python">import os
import cv2
import numpy as np
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model

加载类别名称：

打开classes.txt文件，读取其中的类别名称，并存储在classes列表中。

python">with open('99/classes.txt', 'r') as f:classes = f.read().splitlines()

创建标签映射字典：

label_mapping字典将数字标签映射到对应的情绪名称。

python">label_mapping = {'0': 'sad','1': 'happy','2': 'amazed','3': 'anger'
}

加载图像数据和对应的标签：

遍历image_folder中的所有图像文件。
读取图像，并将其调整为ResNet50所需的输入大小（224x224像素），并将其添加到X_train列表。
读取与图像对应的标签文件（.txt格式），提取标签索引，将其映射到情绪名称，然后转换为数字标签，并添加到y_train列表。

python">image_folder = '561'
label_folder = '99'X_train = []
y_train = []for image_file in os.listdir(image_folder):image_path = os.path.join(image_folder, image_file)image = cv2.imread(image_path)if image is not None:X_train.append(cv2.resize(image, (224, 224)))  # Resize images to (224, 224)label_file = os.path.join(label_folder, image_file.replace('.jpg', '.txt'))with open(label_file, 'r') as f:label_index = f.readline().strip().split()[0]  # 只取第一个数字作为标签索引label_name = label_mapping[label_index]label = classes.index(label_name)y_train.append(label)X_train = np.array(X_train)
y_train = np.array(y_train)

使用ResNet50模型进行迁移学习

加载预训练的ResNet50模型，但不包括顶部的全连接层。
添加一个全局平均池化层（GlobalAveragePooling2D）和一个全连接层（Dense），用于输出预测。
创建一个新的模型，将ResNet50的输出连接到新的全连接层。

python">base_model = ResNet50(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
output = Dense(len(classes), activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=output)

冻结预训练模型的权重：

通过设置base_model.layers中的每个层的trainable属性为False，可以冻结预训练模型的权重，只训练新添加的全连接层。

python">for layer in base_model.layers:
layer.trainable = False

编译模型：

使用Adam优化器、稀疏分类交叉熵损失函数（因为标签是整数），并监控准确率。

python">model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

训练模型：

使用fit方法训练模型，指定训练数据、迭代次数（epochs）、批量大小（batch_size）。

python">model.fit(X_train, y_train, epochs=20, batch_size=32)

在这里插入图片描述

VGG-19

提出时间：2014年主要贡献：VGG-16和VGG-19是由牛津大学的Karen Simonyan和Andrew
Zisserman提出的，它们通过使用非常小的3x3卷积核和大量重复的卷积层来构建深度网络，这使得它们在ImageNet竞赛中取得了很好的成绩。
结构特点：VGG-16包含16个卷积层和3个全连接层，而VGG-19包含19个卷积层和3个全连接层。VGG-16和VGG-19的卷积层都是使用3x3的卷积核，层与层之间使用2x2的最大池化层。

大小：大约1.38亿/1.45亿个参数层数：16/19层卷积层，3层全连接层
特点：使用小尺寸的3x3卷积核和重复的卷积层，可以提取更丰富的特征。

导入必要的库：

os：用于操作文件和目录。
cv2：OpenCV库，用于图像处理。
numpy：用于数值计算。
tensorflow：用于构建和训练深度学习模型。
tensorflow.keras.applications：包含预训练的模型，如VGG19。
tensorflow.keras.layers：Keras层，用于构建神经网络。
tensorflow.keras.models：Keras模型，用于创建模型架构。

python">import os
import cv2
import numpy as np
import tensorflow as tf
from tensorflow.keras.applications import VGG19
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model

加载类别名称

打开classes.txt文件，读取其中的类别名称，并存储在classes列表中。

python">with open('99/classes.txt', 'r') as f:classes = f.read().splitlines()

创建标签映射字典：

label_mapping字典将数字标签映射到对应的情绪名称。

python">abel_mapping = {'0': 'sad','1': 'happy','2': 'amazed','3': 'anger'
}

加载图像数据和对应的标签：

遍历image_folder中的所有图像文件。
读取图像，并将其调整为VGG-19所需的输入大小（224x224像素），并将其添加到X_train列表。
读取与图像对应的标签文件（.txt格式），提取标签索引，将其映射到情绪名称，然后转换为数字标签，并添加到y_train列表。
将X_train和y_train转换为NumPy数组。

python">image_folder = '561'
label_folder = '99'X_train = []
y_train = []for image_file in os.listdir(image_folder):image_path = os.path.join(image_folder, image_file)image = cv2.imread(image_path)if image is not None:X_train.append(cv2.resize(image, (224, 224)))  # Resize images to (224, 224)label_file = os.path.join(label_folder, image_file.replace('.jpg', '.txt'))with open(label_file, 'r') as f:label_index = f.readline().strip().split()[0]  # 只取第一个数字作为标签索引label_name = label_mapping[label_index]label = classes.index(label_name)y_train.append(label)X_train = np.array(X_train)
y_train = np.array(y_train)

使用VGG-19模型进行迁移学习

加载预训练的VGG-19模型，但不包括顶部的全连接层。
添加一个全局平均池化层（GlobalAveragePooling2D）和一个全连接层（Dense），用于输出预测。
创建一个新的模型，将VGG-19的输出连接到新的全连接层。

python">base_model = VGG19(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
x = base_model.output
x = GlobalAveragePooling2D()(x)
output = Dense(len(classes), activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=output)

冻结预训练模型的权重：

通过设置base_model.layers中的每个层的trainable属性为False，可以冻结预训练模型的权重，只训练新添加的全连接层。

python">for layer in base_model.layers:layer.trainable = False

编译模型：

使用Adam优化器、稀疏分类交叉熵损失函数（因为标签是整数），并监控准确率。

python">model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

训练模型：

python">model.fit(X_train, y_train, epochs=20, batch_size=32)

在这里插入图片描述

VGG-16

导入必要的库：

os：用于操作文件和目录。
cv2：OpenCV库，用于图像处理。
numpy：用于数值计算。
tensorflow：用于构建和训练深度学习模型。
tensorflow.keras.applications：包含预训练的模型。
tensorflow.keras.layers：Keras层，用于构建神经网络。
tensorflow.keras.models：Keras模型，用于创建模型架构。

python">import os
import cv2
import numpy as np
import tensorflow as tf
from tensorflow.keras.applications import VGG19
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model

加载类别名称

打开classes.txt文件，读取其中的类别名称，并存储在classes列表中。

python">with open('99/classes.txt', 'r') as f:classes = f.read().splitlines()

创建标签映射字典：

label_mapping字典将数字标签映射到对应的情绪名称。

python">abel_mapping = {'0': 'sad','1': 'happy','2': 'amazed','3': 'anger'
}

加载图像数据和对应的标签：

遍历image_folder中的所有图像文件。
读取图像，并将其调整为VGG-16所需的输入大小（224x224像素），并将其添加到X_train列表。
读取与图像对应的标签文件（.txt格式），提取标签索引，将其映射到情绪名称，然后转换为数字标签，并添加到y_train列表。
将X_train和y_train转换为NumPy数组。

python">image_folder = '561'
label_folder = '99'X_train = []
y_train = []for image_file in os.listdir(image_folder):image_path = os.path.join(image_folder, image_file)image = cv2.imread(image_path)if image is not None:X_train.append(cv2.resize(image, (224, 224)))  # Resize images to (224, 224)label_file = os.path.join(label_folder, image_file.replace('.jpg', '.txt'))with open(label_file, 'r') as f:label_index = f.readline().strip().split()[0]  # 只取第一个数字作为标签索引label_name = label_mapping[label_index]label = classes.index(label_name)y_train.append(label)X_train = np.array(X_train)
y_train = np.array(y_train)

使用VGG-16模型进行迁移学习

加载预训练的VGG-16模型，但不包括顶部的全连接层。
添加一个全局平均池化层（GlobalAveragePooling2D）和一个全连接层（Dense），用于输出预测。
创建一个新的模型，将VGG-16的输出连接到新的全连接层。

python">base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
x = base_model.output
x = GlobalAveragePooling2D()(x)
output = Dense(len(classes), activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=output)

冻结预训练模型的权重：

通过设置base_model.layers中的每个层的trainable属性为False，可以冻结预训练模型的权重，只训练新添加的全连接层。

python">for layer in base_model.layers:
layer.trainable = False

编译模型：

使用Adam优化器、稀疏分类交叉熵损失函数（因为标签是整数），并监控准确率。

训练模型：

python">model.fit(X_train, y_train, epochs=20, batch_size=32)

在这里插入图片描述