INFO:__main__:上传音频文件 梦阳...
INFO:__main__:加载音频文件 梦阳,采样率: 48000, 信号形状: torch.Size([2, 719872])
INFO:speechbrain.utils.parameter_transfer:Loading pretrained files for: embedding_model, mean_var_norm_emb, classifier, label_encoder
2024-11-11 15:33:09.045 Removing orphaned files...
2024-11-11 15:33:09.129 Ignoring event from non-current ScriptRunner: ScriptRunnerEvent.ENQUEUE_FORWARD_MSG
2024-11-11 15:33:09.129 Ignoring event from non-current ScriptRunner: ScriptRunnerEvent.SCRIPT_STOPPED_WITH_SUCCESS
2024-11-11 15:33:09.132 Ignoring event from non-current ScriptRunner: ScriptRunnerEvent.SHUTDOWN
INFO:__main__:生成了向量:[[ 2.87754226e+00 1.48245068e+01 -2.13020115e+01 -4.97987080e+002.76635838e+01 7.64226532e+00 ..
ERROR:pymilvus.decorators:RPC error: [batch_insert], <DataNotMatchException: (code=1, message=The Input data type is inconsistent with defined schema, {embedding} field should be a float_vector, but got a {<class 'list'>} instead.)>, <Time:{'RPC start': '2024-11-11 15:33:10.161173', 'RPC error': '2024-11-11 15:33:10.161653'}>
ERROR:__main__:插入数据失败:<DataNotMatchException: (code=1, message=The Input data type is inconsistent with defined schema, {embedding} field should be a float_vector, but got a {<class 'list'>} instead.)>
INFO:__main__:索引创建成功
INFO:__main__:索引已成功创建INFO:__main__:正在连接到 Milvus 数据库...
INFO:__main__:集合 speaker_vectors 已存在,直接使用该集合
INFO:__main__:成功加载集合 speaker_vectors
INFO:__main__:正在初始化说话人识别模型...
INFO:speechbrain.utils.fetching:Fetch hyperparams.yaml: Fetching from HuggingFace Hub 'speechbrain/spkrec-ecapa-voxceleb' if not cached
INFO:speechbrain.utils.fetching:Fetch custom.py: Fetching from HuggingFace Hub 'speechbrain/spkrec-ecapa-voxceleb' if not cached
INFO:speechbrain.utils.fetching:Fetch custom.py: Fetching from HuggingFace Hub 'speechbrain/spkrec-ecapa-voxceleb' if not cached
INFO:speechbrain.utils.fetching:Fetch embedding_model.ckpt: Fetching from HuggingFace Hub 'speechbrain/spkrec-ecapa-voxceleb' if not cached
INFO:speechbrain.utils.fetching:Fetch embedding_model.ckpt: Fetching from HuggingFace Hub 'speechbrain/spkrec-ecapa-voxceleb' if not cached
INFO:speechbrain.utils.fetching:Fetch mean_var_norm_emb.ckpt: Fetching from HuggingFace Hub 'speechbrain/spkrec-ecapa-voxceleb' if not cached
INFO:speechbrain.utils.fetching:Fetch mean_var_norm_emb.ckpt: Fetching from HuggingFace Hub 'speechbrain/spkrec-ecapa-voxceleb' if not cached
INFO:speechbrain.utils.fetching:Fetch classifier.ckpt: Fetching from HuggingFace Hub 'speechbrain/spkrec-ecapa-voxceleb' if not cached
INFO:speechbrain.utils.fetching:Fetch classifier.ckpt: Fetching from HuggingFace Hub 'speechbrain/spkrec-ecapa-voxceleb' if not cached
INFO:speechbrain.utils.fetching:Fetch label_encoder.txt: Fetching from HuggingFace Hub 'speechbrain/spkrec-ecapa-voxceleb' if not cached
INFO:speechbrain.utils.fetching:Fetch label_encoder.txt: Fetching from HuggingFace Hub 'speechbrain/spkrec-ecapa-voxceleb' if not cached
INFO:speechbrain.utils.parameter_transfer:Loading pretrained files for: embedding_model, mean_var_norm_emb, classifier, label_encoder
INFO:speechbrain.utils.parameter_transfer:Loading pretrained files for: embedding_model, mean_var_norm_emb, classifier, label_encoder
2024-11-11 15:35:29.369 Removing orphaned files...
2024-11-11 15:35:29.369 Ignoring event from non-current ScriptRunner: ScriptRunnerEvent.ENQUEUE_FORWARD_MSG
2024-11-11 15:35:29.458 Ignoring event from non-current ScriptRunner: ScriptRunnerEvent.SCRIPT_STOPPED_WITH_SUCCESS
2024-11-11 15:35:29.460 Ignoring event from non-current ScriptRunner: ScriptRunnerEvent.SHUTDOWN
INFO:__main__:上传音频文件 韩啸...
INFO:__main__:加载音频文件 韩啸,采样率: 48000, 信号形状: torch.Size([1, 1311744])
INFO:__main__:生成了向量:[30.520031 8.26371 -0.85929084 0.40325302 4.9857345 ]...
INFO:__main__:成功将 韩啸 的数据插入到 Milvus,当前实体数: 5
INFO:__main__:索引创建成功
INFO:__main__:索引已成功创建
2024-11-11 15:35:34.335 Removing orphaned files...
2024-11-11 15:35:34.414 Script run finished successfully; removing expired entries from MessageCache (max_age=2)
这个问题的原因可能是由于FFmpeg在从mp4
转为m4a
文件时,音频文件的通道、采样率或格式发生了变化,导致在将数据插入Milvus向量库时出现格式不一致的错误。
以下几点可能是导致此问题的原因:
- 音频通道数的不同:从错误信息来看,您成功上传的音频文件是单声道 (
torch.Size([1, 1311744])
),而失败的文件是双声道 (torch.Size([2, 719872])
)。 - 数据格式的差异:FFmpeg转换时可能会将音频编码成不同的格式,导致数据的维度或类型不一致,进而导致Milvus的
float_vector
字段数据不匹配。
解决方法
为了确保所有的音频数据格式一致,可以在加载和处理音频文件时进行以下处理:
- 转换为单声道:在加载音频文件时,将音频数据强制转换为单声道。
- 规范采样率:确保所有音频的采样率一致,使用
48000 Hz
或您应用中所需的其他固定采样率。 - 确保输出的嵌入向量是一维数组:多通道的音频可能会输出多维的嵌入向量,这可能会与Milvus的预期格式不符。
修改代码示例
在 upload_audio
函数中,添加对音频通道和采样率的规范化处理,确保所有音频文件被转换为单声道且采样率一致:
python">def upload_audio(file, name):logger.info(f"上传音频文件 {name}...")signal, fs = torchaudio.load(file, backend='ffmpeg')logger.info(f"加载音频文件 {name},采样率: {fs}, 信号形状: {signal.shape}")# 统一音频格式:转换为单声道,采样率为48000 Hzif signal.shape[0] > 1:signal = signal.mean(dim=0, keepdim=True) # 将双声道或多声道转换为单声道if fs != 48000:signal = torchaudio.transforms.Resample(orig_freq=fs, new_freq=48000)(signal)fs = 48000embeddings = classifier.encode_batch(signal)vector = embeddings.squeeze().detach().numpy()logger.info(f"生成了向量:{vector[:5]}...") # 打印前5个值以示例# 生成唯一的 idid = int(time.time() * 1000) # 使用当前时间戳作为 ID,确保唯一性# 插入数据data = [[id], # 主键 ID[name], # name 字段[vector.tolist()] # 向量字段]try:collection.insert(data)collection.flush()logger.info(f"成功将 {name} 的数据插入到 Milvus,当前实体数: {collection.num_entities}")except Exception as e:logger.error(f"插入数据失败:{e}")
说明
- 将音频转换为单声道:
signal.mean(dim=0, keepdim=True)
将多通道音频合并为单通道。 - 重新采样为固定采样率:通过
torchaudio.transforms.Resample
将所有音频重新采样为一致的采样率 (48000 Hz
),以确保输入格式一致。 - 确保向量格式:使用
vector.tolist()
转换为列表,以满足Milvus的向量库数据要求。
检查转换后的文件
另外,可以使用FFmpeg重新转换有问题的音频文件,确保输出为单声道和所需的采样率。以下FFmpeg命令可以转换文件为单声道和48000 Hz采样率:
ffmpeg -i input.mp4 -ac 1 -ar 48000 output.m4a
这样可以进一步确保音频格式与您的应用需求一致,并减少数据不匹配的可能性。