所以下面内容更多的记录下如何最简单的使用这个工具,而不是在介绍他在yolov5中的使用,后者具体可以见官方资料:Weights & Biases with YOLOv5
- 1. W&B简单介绍
- 2. W&B快速入门
- 3. W&B使用示例
- 4. W&B更多帮助
1. W&B简单介绍
Wandb是Weights & Biases的缩写,这款工具能够帮助跟踪你的机器学习项目。它能够自动记录模型训练过程中的超参数和输出指标,然后可视化和比较结果,并快速与同事共享结果。(感受到了yolov5作者对其极大的喜爱)
- Dashboard:Track experiments(跟踪实验), visualize results(可视化结果);
- Reports:Save and share reproducible findings(分享和保存结果);
- Sweeps:Optimize models with hyperparameter tuning(超参调优);
- Artifacts:Dataset and model versioning, pipeline tracking(数据集和模型的版本控制);
- 核心优点
2. W&B快速入门
以下测试环境,全部是在本地远程调用服务器的jupyter notebook上进行。
- 安装库
pip install wandb
- 创建用户
wandb login
(yolo) [@localhost ~]$ wandb login
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
wandb: Appending key for api.wandb.ai to your netrc file: /home/xxx/.netrc
- 初始化
# Inside my model training code
import wandb
- 声明超参数
# config is a variable that holds and saves hyper parameters and inputs
config = wandb.config # Initialize config
config.batch_size = 4 # input batch size for training (default:64)
config.test_batch_size = 10 # input batch size for testing(default:1000)
config.epochs = 10 # number of epochs to train(default:10)
config.lr = 0.1 # learning rate(default:0.01)
config.momentum = 0.1 # SGD momentum(default:0.5)
config.no_cuda = False # disables CUDA training
config.seed = 42 # random seed(default:42)
config.log_interval = 10 # how many batches to wait before logging training status
- 记录日志
# wandb.log用来记录一些日志(accuracy,loss and epoch), 便于随时查看网路的性能
def test(args, model, device, test_loader, classes):model.eval()# switch model to evaluation mode.# This is necessary for layers like dropout, batchNorm etc. which behave differently in training and evaluation modetest_loss = 0correct = 0example_images = []with torch.no_grad():for data, target in test_loader:# Load the input features and labels from the test datasetdata, target = data.to(device), target.to(device)# Make predictions: Pass image data from test dataset,# make predictions about class image belongs to(0-9 in this case)output = model(data)# Compute the loss sum up batch losstest_loss += F.nll_loss(output, target, reduction='sum').item()# Get the index of the max log-probabilitypred = output.max(1, keepdim=True)[1]correct += pred.eq(target.view_as(pred)).sum().item()# Log images in your test dataset automatically,# along with predicted and true labels by passing pytorch tensors with image data into wandb.example_images.append(wandb.Image(data[0], caption="Pred:{} Truth:{}".format(classes[pred[0].item()], classes[target[0]])))# wandb.log(a_dict) logs the keys and values of the dictionary passed in and associates the values with a step.# You can log anything by passing it to wandb.log(),# including histograms, custom matplotlib objects, images, video, text, tables, html, pointclounds and other 3D objects.# Here we use it to log test accuracy, loss and some test images (along with their true and predicted labels).wandb.log({"Examples": example_images,"Test Accuracy": 100. * correct / len(test_loader.dataset),"Test Loss": test_loss})
# 数据传入
wandb.log({"Examples": example_images,"Test Accuracy": 100. * correct / len(test_loader.dataset),"Test Loss": test_loss})# 图像传入
wandb.log({"examples" : [wandb.Image(i) for i in images]})
- 保存文件
# by default, this will save to a new subfolder for files associated
# with your run, created in wandb.run.dir (which is ./wandb by default)
wandb.save("mymodel.h5")# you can pass the full path to the Keras model API
model.save(os.path.join(wandb.run.dir, "mymodel.h5"))
3. W&B使用示例
- 参考代码
from __future__ import print_function
import argparse
import random # to set the python random seed
import numpy # to set the numpy random seed
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
# Ignore excessive warnings
import logging
logging.propagate = False
logging.getLogger().setLevel(logging.ERROR)# WandB – Import the wandb library
import wandb
# WandB – Login to your wandb account so you can log all your metrics# 定义Convolutional Neural Network:class Net(nn.Module):def __init__(self):super(Net, self).__init__()# In our constructor, we define our neural network architecture that we'll use in the forward pass.# Conv2d() adds a convolution layer that generates 2 dimensional feature maps# to learn different aspects of our image.self.conv1 = nn.Conv2d(3, 6, kernel_size=5)self.conv2 = nn.Conv2d(6, 16, kernel_size=5)# Linear(x,y) creates dense, fully connected layers with x inputs and y outputs.# Linear layers simply output the dot product of our inputs and weights.self.fc1 = nn.Linear(16 * 5 * 5, 120)self.fc2 = nn.Linear(120, 84)self.fc3 = nn.Linear(84, 10)def forward(self, x):# Here we feed the feature maps from the convolutional layers into a max_pool2d layer.# The max_pool2d layer reduces the size of the image representation our convolutional layers learnt,# and in doing so it reduces the number of parameters and computations the network needs to perform.# Finally we apply the relu activation function which gives us max(0, max_pool2d_output)x = F.relu(F.max_pool2d(self.conv1(x), 2))x = F.relu(F.max_pool2d(self.conv2(x), 2))# Reshapes x into size (-1, 16 * 5 * 5)# so we can feed the convolution layer outputs into our fully connected layer.x = x.view(-1, 16 * 5 * 5)# We apply the relu activation function and dropout to the output of our fully connected layers.x = F.relu(self.fc1(x))x = F.relu(self.fc2(x))x = self.fc3(x)# Finally we apply the softmax function to squash the probabilities of each class (0-9)# and ensure they add to 1.return F.log_softmax(x, dim=1)def train(config, model, device, train_loader, optimizer, epoch):# switch model to training mode. This is necessary for layers like dropout, batchNorm etc.# which behave differently in training and evaluation mode.model.train()# we loop over the data iterator, and feed the inputs to the network and adjust the weights.for batch_id, (data, target) in enumerate(train_loader):if batch_id > 20:break# Loop the input features and labels from the training dataset.data, target = data.to(device), target.to(device)# Reset the gradients to 0 for all learnable weight parametersoptimizer.zero_grad()# Forward pass: Pass image data from training dataset, make predictions# about class image belongs to (0-9 in this case).output = model(data)# Define our loss function, and compute the lossloss = F.nll_loss(output, target)# Backward pass:compute the gradients of loss,the model's parametersloss.backward()# update the neural network weightsoptimizer.step()# wandb.log用来记录一些日志(accuracy,loss and epoch), 便于随时查看网路的性能
def test(args, model, device, test_loader, classes):model.eval()# switch model to evaluation mode.# This is necessary for layers like dropout, batchNorm etc. which behave differently in training and evaluation modetest_loss = 0correct = 0example_images = []with torch.no_grad():for data, target in test_loader:# Load the input features and labels from the test datasetdata, target = data.to(device), target.to(device)# Make predictions: Pass image data from test dataset,# make predictions about class image belongs to(0-9 in this case)output = model(data)# Compute the loss sum up batch losstest_loss += F.nll_loss(output, target, reduction='sum').item()# Get the index of the max log-probabilitypred = output.max(1, keepdim=True)[1]correct += pred.eq(target.view_as(pred)).sum().item()# Log images in your test dataset automatically,# along with predicted and true labels by passing pytorch tensors with image data into wandb.example_images.append(wandb.Image(data[0], caption="Pred:{} Truth:{}".format(classes[pred[0].item()], classes[target[0]])))# wandb.log(a_dict) logs the keys and values of the dictionary passed in and associates the values with a step.# You can log anything by passing it to wandb.log(),# including histograms, custom matplotlib objects, images, video, text, tables, html, pointclounds and other 3D objects.# Here we use it to log test accuracy, loss and some test images (along with their true and predicted labels).wandb.log({"Examples": example_images,"Test Accuracy": 100. * correct / len(test_loader.dataset),"Test Loss": test_loss})# 初始化一个wandb run, 并设置超参数
# Initialize a new run
# wandb.init(project="pytorch-intro")
wandb.init(project='test-project', entity='clichong')
wandb.watch_called = False # Re-run the model without restarting the runtime, unnecessary after our next release# config is a variable that holds and saves hyper parameters and inputs
config = wandb.config # Initialize config
config.batch_size = 4 # input batch size for training (default:64)
config.test_batch_size = 10 # input batch size for testing(default:1000)
config.epochs = 10 # number of epochs to train(default:10)
config.lr = 0.1 # learning rate(default:0.01)
config.momentum = 0.1 # SGD momentum(default:0.5)
config.no_cuda = False # disables CUDA training
config.seed = 42 # random seed(default:42)
config.log_interval = 10 # how many batches to wait before logging training statusdef main():use_cuda = not config.no_cuda and torch.cuda.is_available()device = torch.device("cuda:0" if use_cuda else "cpu")kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}# Set random seeds and deterministic pytorch for reproducibility# random.seed(config.seed) # python random seedtorch.manual_seed(config.seed) # pytorch random seed# numpy.random.seed(config.seed) # numpy random seedtorch.backends.cudnn.deterministic = True# Load the dataset: We're training our CNN on CIFAR10.# First we define the transformations to apply to our images.transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])# Now we load our training and test datasets and apply the transformations defined abovetrain_loader = DataLoader(datasets.CIFAR10(root='../../Classification/StageCNN/dataset/cifar10/', # 路径自行更改train=True,download=False,transform=transform), batch_size=config.batch_size, shuffle=True, **kwargs)test_loader = DataLoader(datasets.CIFAR10(root='../../Classification/StageCNN/dataset/cifar10/', # 路径自行更改train=False,download=False,transform=transform), batch_size=config.batch_size, shuffle=False, **kwargs)classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')# Initialize our model, recursively go over all modules and convert their parameters# and buffers to CUDA tensors (if device is set to cuda)model = Net().to(device)optimizer = optim.SGD(model.parameters(), lr=config.lr, momentum=config.momentum)# wandb.watch() automatically fetches all layer dimensions, gradients, model parameters# and logs them automatically to your dashboard.# using log="all" log histograms of parameter values in addition to gradientswandb.watch(model, log="all")for epoch in range(1, config.epochs + 1):train(config, model, device, train_loader, optimizer, epoch)test(config, model, device, test_loader, classes)# Save the model checkpoint. This automatically saves a file to the cloudtorch.save(model.state_dict(), 'model.h5')wandb.save('model.h5')if __name__ == '__main__':main()
- Parameters
在运行当中,可以在其提供的链接中动态的查看训练过程与中间结果,wandb.watch(model, log="all")
- Chart & Media
# example_images.append(wandb.Image(
# data[0], caption="Pred:{} Truth:{}".format(classes[pred[0].item()], classes[target[0]])))wandb.log({"Examples": example_images,"Test Accuracy": 100. * correct / len(test_loader.dataset),"Test Loss": test_loss})
- Save
在模型训练完成保存在本地上时,还可以进行 wandb.save('model.h5')
4. W&B更多帮助
1. wandb: 深度学习轻量级可视化工具入门教程
2. PyTorch 62.只需10分钟带你完美入门轻量级可视化工具wandb
3. wandb使用
4. W&B官网