【机器学习】—逻辑回归

逻辑回归实现详解

介绍

逻辑回归（Logistic Regression）是一种广泛应用于分类问题的统计模型，尤其适用于二分类问题。本文将通过一个简单的例子，使用Python和PyTorch库实现逻辑回归，并通过可视化展示模型的训练过程和最终结果。

环境准备

在开始之前，确保已经安装了以下库：

numpy
matplotlib
torch
sklearn
可以使用以下命令安装这些库：

pip install numpy matplotlib torch scikit-learn

代码实现

数据生成

我们使用sklearn.datasets.make_blobs函数生成二分类数据集。该函数可以生成具有指定中心和标准差的高斯分布数据点。

from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
import torch
import numpy as np
# 使用make_blobs随机生成n个样本
x, y = make_blobs(n_samples=200, centers=2, random_state=0, cluster_std=0.5)
x1 = x[:, 0]
x2 = x[:, 1]
# 可视化数据
plt.scatter(x1[y == 1], x2[y == 1], color='blue', marker='o')
plt.scatter(x1[y == 0], x2[y == 0], color='red', marker='x')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Generated Data')
plt.show()

模型定义

逻辑回归模型的基本形式为：
$h(\theta) = \sigma(\theta_0 + \theta_1 x_1 + \theta_2 x_2)$
其中， $\sigma(z) = \frac{1}{1 + e^{-z}}$ 是sigmoid函数。

def hypothesis(theta0, theta1, theta2, x1, x2):z = theta0 + theta1 * x1 + theta2 * x2h = torch.sigmoid(z)return h.view(-1, 1)

损失函数

逻辑回归的损失函数通常使用对数损失函数（Log Loss）：
$-\frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h^{(i)}) + (1 - y^{(i)}) \log(1 - h^{(i)}) \right]$

def J(h, y):return -torch.mean(y * torch.log(h) + (1 - y) * torch.log(1 - h))

模型训练

我们使用PyTorch的Adam优化器来训练模型。训练过程中，我们不断更新模型参数以最小化损失函数。

if __name__ == '__main__':# 数据准备x1 = torch.tensor(x1, dtype=torch.float32)x2 = torch.tensor(x2, dtype=torch.float32)y = torch.tensor(y, dtype=torch.float32).view(-1, 1)# 初始化参数theta0 = torch.tensor(0.0, requires_grad=True)theta1 = torch.tensor(0.0, requires_grad=True)theta2 = torch.tensor(0.0, requires_grad=True)# 优化器optimizer = torch.optim.Adam([theta0, theta1, theta2])# 训练模型for epoch in range(10000):h = hypothesis(theta0, theta1, theta2, x1, x2)loss = J(h, y)loss.backward()optimizer.step()optimizer.zero_grad()if epoch % 1000 == 0:print(f'After {epoch} epochs, the loss is {loss.item():.3f}')# 获取训练后的参数w1 = theta1.item()w2 = theta2.item()b = theta0.item()# 可视化决策边界x = np.linspace(-1, 6, 100)d = -(w1 * x + b) * 1.0 / w2plt.scatter(x1[y == 1], x2[y == 1], color='blue', marker='o')plt.scatter(x1[y == 0], x2[y == 0], color='red', marker='x')plt.plot(x, d, color='green')plt.xlabel('Feature 1')plt.ylabel('Feature 2')plt.title('Decision Boundary')plt.show()

结果分析

输出：

after 0 ,the loss is 0.693
after 1000 ,the loss is 0.188
after 2000 ,the loss is 0.086
after 3000 ,the loss is 0.049
after 4000 ,the loss is 0.031
after 5000 ,the loss is 0.020
after 6000 ,the loss is 0.014
after 7000 ,the loss is 0.009
after 8000 ,the loss is 0.007
after 9000 ,the loss is 0.005

在这里插入图片描述