具体代码实现如下:
python">from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocesspath_audio = "emo/happy.mp3"# 音频文件
# 加载模型
model_dir = "iic/SenseVoiceSmall"
model = AutoModel(model=model_dir,trust_remote_code=True,remote_code="./model.py",vad_model="fsmn-vad",vad_kwargs={"max_single_segment_time": 30000},device="cuda:0",cache_dir = "./ckpt"
)
# 模型预测识别
res = model.generate(input=path_audio,cache={},language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"use_itn=True,batch_size_s=60,merge_vad=True, #merge_length_s=15,
)
# text = rich_transcription_postprocess(res[0]["text"])
print("音频文件:{}".format(path_audio))
print("识别预测结果:{}".format(res[0]["text"]))
脚本运行log如下:
python">音频文件:emo/happy.mp3
识别预测结果:<|zh|><|HAPPY|><|Speech|><|withitn|>你好,见到你很高兴。
助力快速掌握数据集的信息和使用方式。
数据可以如此美好!