Dataset for Stable Diffusion

devtools/2024/9/20 1:17:27/ 标签: stable diffusion, flickr8k, 数据集的处理

1.Dataset for Stable Diffusion

笔记来源:
1.Flickr8k数据集处理
2.处理Flickr8k数据集
3.Github:pytorch-stable-diffusion
4.Flickr 8k Dataset
5.dataset_flickr8k.json
6.About Train, Validation and Test Sets in Machine Learning Tarang Shah Towards Data Science
7.What are hyperparameters?

1.1 Dataset

采用Flicker8k数据集,该数据集有两个文件,第一个文件为Flicker8k_Dataset (全部为图片),第二个文件为Flickr8k.token.txt (含两列image_id和caption),其中一个image_id对应5个caption (sentence)

1.2 Dataset description file

数据集文本描述文件:dataset_flickr8k.json
文件格式如下:
{“images”: [ {“sentids”: [ ],“imgid”: 0,“sentences”:[{“tokens”:[ ]}, {“tokens”:[ ], “raw”: “…”, “imgid”:0, “sentid”:0}, …, “split”: “train”, “filename”: …jpg}, {“sentids”…} ], “dataset”: “flickr8k”}

参数解释
“sentids”:[0,1,2,3,4]caption 的 id 范围(一个image对应5个caption,所以sentids从0到4)
“imgid”:0image 的 id(从0到7999共8000张image)
“sentences”:[ ]包含一张照片的5个caption
“tokens”:[ ]每个caption分割为单个word
“raw”: " "每个token连接起来的caption
“imgid”: 0与caption相匹配的image的id
“sentid”: 0imag0对应的具体的caption的id
“split”:" "将该image和对应caption划分到训练集or验证集or测试集
“filename”:“…jpg”image具体名称

dataset_flickr8k.json

1.3 Process Datasets

下面代码摘自:Flickr8k数据集处理(仅作学习使用)

import json
import os
import random
from collections import Counter, defaultdict
from matplotlib import pyplot as plt
from PIL import Image
from argparse import Namespace
import numpy as np
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pack_padded_sequence
from torch.utils.data import Dataset
import torchvision
import torchvision.transforms as transformsdef create_dataset(dataset='flickr8k', captions_per_image=5, min_word_count=5, max_len=30):"""Parameters:dataset: Name of the datasetcaptions_per_image: Number of captions per imagemin_word_count: Only consider words that appear at least this many times in the dataset (excluding the test set)max_len: Maximum number of words in a caption. Captions longer than this will be truncated.Output:A vocabulary file: vocab.jsonThree dataset files: train_data.json, val_data.json, test_data.json"""# Paths for reading data and saving processed data# Path to the dataset JSON fileflickr_json_path = ".../sd/data/dataset_flickr8k.json"# Folder containing imagesimage_folder = ".../sd/data/Flicker8k_Dataset"# Folder to save processed results# The % operator is used to format the string by replacing %s with the value of the dataset variable.# For example, if dataset is "flickr8k", the resulting output_folder will be# /home/wxy/Documents/PycharmProjects/pytorch-stable-diffusion/sd/data/flickr8k.output_folder = ".../sd/data/%s" % dataset# Ensure output directory existsos.makedirs(output_folder, exist_ok=True)print(f"Output folder: {output_folder}")# Read the dataset JSON filewith open(file=flickr_json_path, mode="r") as j:data = json.load(fp=j)# Initialize containers for image paths, captions, and vocabulary# Dictionary to store image pathsimage_paths = defaultdict(list)# Dictionary to store image captionsimage_captions = defaultdict(list)# Count the number of elements, then count and return a dictionary# key:element value:the number of elements.vocab = Counter()# read from file dataset_flickr8k.jsonfor img in data["images"]:  # Iterate over each image in the datasetsplit = img["split"]  # Determine the split (train, val, or test) for the imagecaptions = []for c in img["sentences"]:  # Iterate over each caption for the image# Update word frequency count, excluding test set dataif split != "test":  # Only update vocabulary for train/val splits# c['tokens'] is a list, The number of occurrences of each word in the list is increased by onevocab.update(c['tokens'])  # Update vocabulary with words in the caption# Only consider captions that are within the maximum lengthif len(c["tokens"]) <= max_len:captions.append(c["tokens"])  # Add the caption to the list if it meets the length requirementif len(captions) == 0:  # Skip images with no valid captionscontinue# Construct the full image path/home/wxy/Documents/PycharmProjects/pytorch-stable-diffusion# image_folder + image_name# ./Flicker8k_Dataset/img['filename']path = os.path.join(image_folder, img['filename'])# Save the full image path and its captions in the respective dictionariesimage_paths[split].append(path)image_captions[split].append(captions)'''After the above steps, we have:- vocab(a dict) keys:words、values: counts of all words- image_paths: (a dict) keys "train", "val", and "test"; values: lists of absolute image paths- image_captions: (a dict) keys: "train", "val", and "test"; values: lists of captions'''/home/wxy/Documents/PycharmProjects/pytorch-stable-diffusion........

我们通过dataset_flickr8k.json文件把数据集转化为三个词典

dictkeyvalue
vacabwordfrequency of words in all captions
image_path“train”、“val”、“test”lists of absolute image path
image_captions“train”、“val”、“test”lists of captions

我们通过Debug打印其中的内容

print(vocab)
print(image_paths["train"][1])
print(image_captions["train"][1])

def create_dataset(dataset='flickr8k', captions_per_image=5, min_word_count=5, max_len=30):"""Parameters:dataset: Name of the datasetcaptions_per_image: Number of captions per imagemin_word_count: Only consider words that appear at least this many times in the dataset (excluding the test set)max_len: Maximum number of words in a caption. Captions longer than this will be truncated.Output:A vocabulary file: vocab.jsonThree dataset files: train_data.json, val_data.json, test_data.json"""........# Create the vocabulary, adding placeholders for special tokens# Add placeholders<pad>, unregistered word identifiers<unk>, sentence beginning and end identifiers<start><end>words = [w for w in vocab.keys() if vocab[w] > min_word_count]  # Filter words by minimum countvocab = {k: v + 1 for v, k in enumerate(words)}  # Create the vocabulary with indices# Add special tokens to the vocabularyvocab['<pad>'] = 0vocab['<unk>'] = len(vocab)vocab['<start>'] = len(vocab)vocab['<end>'] = len(vocab)# Save the vocabulary to a filewith open(os.path.join(output_folder, 'vocab.json'), "w") as fw:json.dump(vocab, fw)# Process each dataset split (train, val, test)# Iterate over each split: split = "train" 、 split = "val" 和 split = "test"for split in image_paths:# List of image paths for the splitimgpaths = image_paths[split]  # type(imgpaths)=list# List of captions for the splitimcaps = image_captions[split]  # type(imcaps)=list# store result that converting words of caption to their respective indices in the vocabularyenc_captions = []for i, path in enumerate(imgpaths):# Check if the image can be openedimg = Image.open(path)# Ensure each image has the required number of captionsif len(imcaps[i]) < captions_per_image:filled_num = captions_per_image - len(imcaps[i])# Repeat captions if neededcaptions = imcaps[i] + [random.choice(imcaps[i]) for _ in range(0, filled_num)]else:# Randomly sample captions if there are more than neededcaptions = random.sample(imcaps[i], k=captions_per_image)assert len(captions) == captions_per_imagefor j, c in enumerate(captions):# Encode each caption by converting words to their respective indices in the vocabularyenc_c = [vocab['<start>']] + [vocab.get(word, vocab['<unk>']) for word in c] + [vocab["<end>"]]enc_captions.append(enc_c)assert len(imgpaths) * captions_per_image == len(enc_captions)data = {"IMAGES": imgpaths,"CAPTIONS": enc_captions}# Save the processed dataset for the current split (train,val,test)with open(os.path.join(output_folder, split + "_data.json"), 'w') as fw:json.dump(data, fw)create_dataset()

经过create_dataset函数,我们得到如下图的文件

四个文件的详细内容见下表

train_data.json中的第一个key:IMAGES
train_data.json中的第二个key:CAPTIONS
test_data.json中的第一个key:IMAGES
test_data.json中的第二个key:CAPTIONS
val_data.json中的第一个key:IMAGES
val_data.json中的第二个key:CAPTIONS
vocab.json开始部分
vocab.json结尾部分

生成vocab.json的关键代码
首先统计所有caption中word出现至少大于5次的word,而后给这些word依次赋予一个下标

# Create the vocabulary, adding placeholders for special tokens# Add placeholders<pad>, unregistered word identifiers<unk>, sentence beginning and end identifiers<start><end># Create a list of words from the vocabulary that have a frequency higher than 'min_word_count'# min_word_count: Only consider words that appear at least this many times in the dataset (excluding the test set)words = [w for w in vocab.keys() if vocab[w] > min_word_count]  # Filter words by minimum count# assign an index to each word, starting from 1 (indices start from 0, so add 1)vocab = {k: v + 1 for v, k in enumerate(words)}  # Create the vocabulary with indices

最终生成vocab.json

生成 [“split”]_data.json 的关键
读入文件dataset_flickr8k.json,并创建两个字典,第一个字典放置每张image的绝对路径,第二个字典放置描述image的caption,根据vocab将token换为下标保存,根据文件dataset_flickr8k.json中不同的split,这image的绝对路径和相应caption保存在不同文件中(train_data.json、test_data.json、val_data.json)

dataset_flickr8k.json

train_data.json


从vocab中获取token的下标得到CAPTION的编码

for j, c in enumerate(captions):# Encode each caption by converting words to their respective indices in the vocabularyenc_c = [vocab['<start>']] + [vocab.get(word, vocab['<unk>']) for word in c] + [vocab["<end>"]]enc_captions.append(enc_c)


尝试使用上面生成的测试集文件test_data.json和vocab.json输出某张image以及对应的caption
下面代码摘自:Flickr8k数据集处理(仅作学习使用)

'''
test
1.Iterates over the 5 captions for 下面代码引用自:[Flickr8k数据集处理](https://blog.csdn.net/weixin_48981284/article/details/134676813)(仅作学习使用)the 250th image.
2.Retrieves the word indices for each caption.
3.Converts the word indices to words using vocab_idx2word.
4.Joins the words to form complete sentences.
5.Prints each caption.
'''
import json
from PIL import Image
from matplotlib import pyplot as plt
# Load the vocabulary from the JSON file
with open('.../sd/data/flickr8k/vocab.json', 'r') as f:vocab = json.load(f)  # Load the vocabulary from the JSON file into a dictionary
# Create a dictionary to map indices to words
vocab_idx2word = {idx: word for word, idx in vocab.items()}
# Load the test data from the JSON file
with open('.../sd/data/flickr8k/test_data.json', 'r') as f:data = json.load(f)  # Load the test data from the JSON file into a dictionary
# Open and display the 250th image in the test set
# Open the image at index 250 in the 'IMAGES' list
content_img = Image.open(data['IMAGES'][250])
plt.figure(figsize=(6, 6))
plt.subplot(1,1,1)
plt.imshow(content_img)
plt.title('Image')
plt.axis('off')
plt.show()
# Print the lengths of the data, image list, and caption list
# Print the number of keys in the dataset dictionary (should be 2: 'IMAGES' and 'CAPTIONS')
print(len(data))
print(len(data['IMAGES']))  # Print the number of images in the 'IMAGES' list
print(len(data["CAPTIONS"]))  # Print the number of captions in the 'CAPTIONS' list
# Display the captions for the 300th image
# Iterate over the 5 captions associated with the 300th image
for i in range(5):# Get the word indices for the i-th caption of the 300th imageword_indices = data['CAPTIONS'][250 * 5 + i]# Convert indices to words and join them to form a captionprint(''.join([vocab_idx2word[idx] for idx in word_indices]))

data 的 key 有两个 IMAGES 和 CAPTIONS
测试集image有1000张,每张对应5个caption,共5000个caption
第250张图片的5个caption如下图

1.4 Dataloader

下面代码摘自:Flickr8k数据集处理(仅作学习使用)

import json
import os
import random
from collections import Counter, defaultdict
from PIL import Image
import torch
from torch.utils.data import Dataset
from torch.utils import data
import torchvision.transforms as transformsclass ImageTextDataset(Dataset):"""Pytorch Dataset class to generate data batches using torch DataLoader"""def __init__(self, dataset_path, vocab_path, split, captions_per_image=5, max_len=30, transform=None):"""Parameters:dataset_path: Path to the JSON file containing the datasetvocab_path: Path to the JSON file containing the vocabularysplit: The dataset split, which can be "train", "val", or "test"captions_per_image: Number of captions per imagemax_len: Maximum number of words per captiontransform: Image transformation methods"""self.split = split# Validate that the split is one of the allowed valuesassert self.split in {"train", "val", "test"}# Store captions per imageself.cpi = captions_per_image# Store maximum caption lengthself.max_len = max_len# Load the datasetwith open(dataset_path, "r") as f:self.data = json.load(f)# Load the vocabularywith open(vocab_path, "r") as f:self.vocab = json.load(f)# Store the image transformation methodsself.transform = transform# Number of captions in the dataset# Calculate the size of the datasetself.dataset_size = len(self.data["CAPTIONS"])def __getitem__(self, i):"""Retrieve the i-th sample from the dataset"""# Get [i // self.cpi]-th image corresponding to the i-th sample (each image has multiple captions)img = Image.open(self.data['IMAGES'][i // self.cpi]).convert("RGB")# Apply image transformation if providedif self.transform is not None:# Apply the transformation to the imageimg = self.transform(img)# Get the length of the captioncaplen = len(self.data["CAPTIONS"][i])# Pad the caption if its length is less than max_lenpad_caps = [self.vocab['<pad>']] * (self.max_len + 2 - caplen)# Convert the caption to a tensor and pad itcaption = torch.LongTensor(self.data["CAPTIONS"][i] + pad_caps)return img, caption, caplen  # Return the image, caption, and caption lengthdef __len__(self):return self.dataset_size  # Number of samples in the datasetdef make_train_val(data_dir, vocab_path, batch_size, workers=4):"""Create DataLoader objects for training, validation, and testing sets.Parameters:data_dir: Directory where the dataset JSON files are locatedvocab_path: Path to the vocabulary JSON filebatch_size: Number of samples per batchworkers: Number of subprocesses to use for data loading (default is 4)Returns:train_loader: DataLoader for the training setval_loader: DataLoader for the validation settest_loader: DataLoader for the test set"""# Define transformation for training settrain_tx = transforms.Compose([transforms.Resize(256),  # Resize images to 256x256transforms.ToTensor(),  # Convert image to PyTorch tensortransforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # Normalize using ImageNet mean and std])val_tx = transforms.Compose([transforms.Resize(256),transforms.ToTensor(),transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])# Create dataset objects for training, validation, and test setstrain_set = ImageTextDataset(dataset_path=os.path.join(data_dir, "train_data.json"), vocab_path=vocab_path,split="train", transform=train_tx)vaild_set = ImageTextDataset(dataset_path=os.path.join(data_dir, "val_data.json"), vocab_path=vocab_path,split="val", transform=val_tx)test_set = ImageTextDataset(dataset_path=os.path.join(data_dir, "test_data.json"), vocab_path=vocab_path,split="test", transform=val_tx)# Create DataLoader for training set with data shufflingtrain_loder = data.DataLoader(dataset=train_set, batch_size=batch_size, shuffer=True,num_workers=workers, pin_memory=True)# Create DataLoader for validation set without data shufflingval_loder = data.DataLoader(dataset=vaild_set, batch_size=batch_size, shuffer=False,num_workers=workers, pin_memory=True, drop_last=False)# Create DataLoader for test set without data shufflingtest_loder = data.DataLoader(dataset=test_set, batch_size=batch_size, shuffer=False,num_workers=workers, pin_memory=True, drop_last=False)return train_loder, val_loder, test_loder

创建好train_loader后,接下来我们就可以着手开始训练SD了!

1.5 Training、Validation、Test Set

了解训练集、测试集、验证集的作用

训练集
用于模型训练阶段

Training Dataset: The sample of data used to fit the model.

The actual dataset that we use to train the model (weights and biases in the case of a Neural Network). The model sees and learns from this data.

验证集
用于模型调参阶段

Validation Dataset: The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.

The validation set is used to evaluate a given model, but this is for frequent evaluation. We, as machine learning engineers, use this data to fine-tune the model hyperparameters. Hence the model occasionally sees this data, but never does it “Learn” from this. We use the validation set results, and update higher level hyperparameters. So the validation set affects a model, but only indirectly. The validation set is also known as the Dev set or the Development set. This makes sense since this dataset helps during the “development” stage of the model.

测试集
在模型训练且调参阶段完成后测试模型性能

Test Dataset: The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.

The Test dataset provides the gold standard used to evaluate the model. It is only used once a model is completely trained(using the train and validation sets). The test set is generally what is used to evaluate competing models (For example on many Kaggle competitions, the validation set is released initially along with the training set and the actual test set is only released when the competition is about to close, and it is the result of the the model on the Test set that decides the winner). Many a times the validation set is used as the test set, but it is not good practice. The test set is generally well curated. It contains carefully sampled data that spans the various classes that the model would face, when used in the real world.

数据集划分比例
Now that you know what these datasets do, you might be looking for recommendations on how to split your dataset into Train, Validation and Test sets.

This mainly depends on 2 things. First, the total number of samples in your data and second, on the actual model you are training.

Some models need substantial data to train upon, so in this case you would optimize for the larger training sets. Models with very few hyperparameters will be easy to validate and tune, so you can probably reduce the size of your validation set, but if your model has many hyperparameters, you would want to have a large validation set as well(although you should also consider cross validation). Also, if you happen to have a model with no hyperparameters or ones that cannot be easily tuned, you probably don’t need a validation set too!

All in all, like many other things in machine learning, the train-test-validation split ratio is also quite specific to your use case and it gets easier to make judge ment as you train and build more and more models.

What are hyperparameters?
Hyperparameters are external configuration variables that data scientists use to manage machine learning model training. Sometimes called model hyperparameters, the hyperparameters are manually set before training a model. They’re different from parameters, which are internal parameters automatically derived during the learning process and not set by data scientists.
Examples of hyperparameters include the number of nodes and layers in a neural network and the number of branches in a decision tree. Hyperparameters determine key features such as model architecture, learning rate, and model complexity.


http://www.ppmy.cn/devtools/59446.html

相关文章

简谈设计模式之桥接模式

桥接模式是一种结构型设计模式, 它将抽象部分和它的实现部分分离, 使它们可以独立变化. 这意味着可以改变它的抽象和它的实现, 而不会相互影响 桥接模式结构 抽象 (Abstraction): 定义抽象类, 并包含一个对实现类对象的引用拓展抽象 (Refined Abstraction): 拓展抽象类, 通过…

堆、栈和队列(数据结构)

堆、栈和队列&#xff08;数据结构&#xff09; 这里写目录标题 堆、栈和队列&#xff08;数据结构&#xff09;**栈****队列**堆&#xff08;Heap&#xff09;&#xff08;&#xff09;队列&#xff08;Queue&#xff09;&#xff08;FIFO&#xff09;栈&#xff08;Stack&…

搜维尔科技:通过 Xsens MVN Link 套装测试动作捕捉动画,由虚幻引擎5渲染

通过Xsens MVN Link套装测试动作捕捉动画&#xff0c;由虚幻引擎5渲染 搜维尔科技&#xff1a;通过 Xsens MVN Link 套装测试动作捕捉动画&#xff0c;由虚幻引擎5渲染

FPGA实训报告DAY 1(Verilog HDL)

实习日志与总结 日期&#xff1a;2024 年 7 月 10 日 星期三 姓名&#xff1a;XXX 一、实习日志 上午 9:00 - 9:30 按时到达工位&#xff0c;参加部门早会&#xff0c;了解了今天的实习任务和目标&#xff0c;即初步学习 FPGA 简介和 Verilog 基础语法知识。 9:30 - 10:30…

Flask 静态文件处理

1. 静态文件目录 Flask 默认会在应用的根目录下寻找一个名为 static 的文件夹&#xff0c;并将其作为静态文件的存储目录。你可以通过 static_folder 参数来指定不同的静态文件目录路径。 from flask import Flask app Flask(__name__, static_foldermy_static) 2. 静态文件 …

图扑低代码数字孪生 Web SCADA 智慧钢厂

2024 年 4 月&#xff0c;中国钢铁工业协会发布了《钢铁行业数字化转型评估报告&#xff08;2023年&#xff09;》&#xff08;以下简称《报告》&#xff09;。《报告》指出&#xff0c;绝大部分钢铁企业建立了数字化转型相关管理组织和团队&#xff0c;并加强其规划落实&#…

LDAPWordlistHarvester:基于LDAP数据的字典生成工具

关于LDAPWordlistHarvester LDAPWordlistHarvester是一款功能强大的字典列表生成工具&#xff0c;该工具可以根据LDAP中的详细信息生成字典列表文件&#xff0c;广大研究人员随后可以利用生成的字典文件测试目标域账号的非随机密码安全性。 工具特征 1、支持根据LDAP中的详细信…

CentOS 7 网络配置

如想了解请查看 虚拟机安装CentOS7 第一步&#xff1a;查看虚拟机网络编辑器、查看NAT设置 &#xff08;子网ID&#xff0c;网关IP&#xff09; 第二步&#xff1a;配置VMnet8 IP与DNS 注意事项&#xff1a;子网掩码与默认网关与 第一步 保持一致 第三步&#xff1a;网络配置…

初学者指南:如何搭建和配置 Nginx 服务器

初学者指南&#xff1a;如何搭建和配置 Nginx 服务器 Nginx 是一个高性能的 HTTP 和反向代理服务器&#xff0c;也是一个 IMAP/POP3/SMTP 代理服务器。本文将详细介绍如何在 Linux 上安装、配置和管理 Nginx 服务器。 一、安装 Nginx Nginx 可以安装在多种操作系统上&#x…

Window -- redis 服务注册、Mysql 服务注册

Windows 服务注册 cd 进入 Redis 主目录 cd /d F:\Redis-x64-5.0.14.1注册 Redis 为系统服务&#xff0c;并指定配置文件 redis-server --service-install redis.windows.conf --loglevel verbose开启服务 redis-server --service-start停止服务 redis-server --service-s…

关于HDFS 和HBase

Apache HBase 被设计为在 Hadoop 分布式文件系统 (HDFS) 上运行的一个特殊类型的数据库。大白话&#xff1a; 想象一下&#xff0c;你有一个巨大的图书馆&#xff0c;这个图书馆就像 HDFS&#xff0c;它的架子上堆满了各种各样的书籍&#xff0c;每本书都非常厚&#xff0c;而…

安防监控视频平台LntonCVS视频融合共享平台智慧消防实现远程集中视频监控方案

近年来&#xff0c;电力系统内变电站着火事件频发&#xff0c;这对消防安全管理提出了严峻挑战。我国消防安全基础设施不完善、管理机制不健全、应急处置能力不足及公众消防安全意识淡薄等问题&#xff0c;严重制约了消防安全的提升。因此&#xff0c;加强变电站的消防安全管理…

HashMap源码解析

目录 一:put方法流程 二&#xff1a;get方法 三&#xff1a;扩容机制 一&#xff1a;put方法流程 public V put(K key, V value) {return putVal(hash(key), key, value, false, true); }final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) {No…

Qcom平台通过Hexagon IDE 测试程序性能指导

Qcom平台通过Hexagon IDE 测试程序性能指导 1 安装Hexagon IDE工具2 测试工程2.1 打开Hexagon IDE2.2 新建工程2.3 添加测试案例2.3.1 方法一&#xff1a;新建2.3.2 方法二&#xff1a;拷贝 2.4 配置测试环境2.4.1 包含头文件2.4.2 添加程序优化功能(需先bulid一下)2.4.3 添加g…

79. UE5 RPG 创建技能冷却和消耗

在这一篇里面&#xff0c;我们接着优化技能&#xff0c;现在角色添加的主动技能能够同步到ui上面。我们在这一篇文章里面&#xff0c;完善技能的消耗&#xff08;释放技能减少蓝量&#xff09;和冷却机制。 我们可以看到&#xff0c;在技能类默认值这里&#xff0c;可以设置它的…

分类题解清单

目录 简介MySQL题一、聚合函数二、排序和分组三、高级查询和连接四、子查询五、高级字符串函数 / 正则表达式 / 子句 算法题一、双指针二、滑动窗口三、模拟四、贪心五、矩阵六、排序七、链表八、设计九、前缀和十、哈希表十一、字符串十二、二叉树十三、二分查找十四、回溯十五…

LabVIEW红外热波图像缺陷检

开发使用LabVIEW开发的红外热波图像缺陷检测系统。该系统结合红外热像仪、工业相机和高效的数据采集硬件&#xff0c;实现对工件表面缺陷的自动检测和分析。通过LabVIEW的强大功能&#xff0c;系统能够实时采集、处理和显示红外热波图像&#xff0c;有效提高了检测的精度和效率…

Windows 11预览补丁KB5040527影响火绒驱动加载的解决办法

7 月 11 日&#xff0c;微软更新Windows 11 预览版本补丁 KB5040527&#xff0c;补丁安装后会影响火绒驱动加载导致火绒安全软件服务异常&#xff0c;补丁相关信息如下&#xff1a; https://blogs.windows.com/windows-insider/2024/07/11/releasing-windows-11-builds-22621-…

如何在Java中处理空集合和空指针

文章目录 概要示例代码详细解释如下正确处理集合的技巧总结 概要 在Java开发中&#xff0c;我们经常需要处理集合&#xff08;如ArrayList&#xff09;的情况。为了确保程序的稳定性和正确性&#xff0c;我们必须妥善处理空集合和空指针。 示例代码 public class Main {publ…

大语言模型(Large Language Model, LLM)——初步详细了解!!!

LLM 1.1 **基本概念**1.2. **主要特点**1.3. **主要应用**1.4. **著名大语言模型**1.5. **挑战和局限**1.6. **未来发展**2.1. 文献综述与资料收集2.2. 数据分析与预处理2.3. 实验设计与优化2.4. 结果分析与解释2.5. 科研写作与报告6. 知识扩展与创新2.7. 具体工具与平台2.8 示…