1.安装nemo
pip install -U nemo_toolkit[all] ASR-metrics
2.下载ASR预训练模型到本地(建议使用huggleface,比nvidia官网快很多)
3.从本地创建ASR模型
asr_model = nemo_asr.models.EncDecCTCModel.restore_from("stt_zh_quartznet15x5.nemo")
3.定义train_mainfest,包含语音文件路径、时长和语音文本的json文件
{"audio_filepath": "test.wav", "duration": 8.69, "text": "诶前天跟我说昨天跟我说十二期利率是多少工号幺九零八二六十二期的话零点八一万的话分十二期利息八十嘛"}
4.读取模型的yaml配置
# 使用YAML读取quartznet模型配置文件
try:
from ruamel.yaml import YAML
except ModuleNotFoundError:
from ruamel_yaml import YAML
config_path ="/NeMo/examples/asr/conf/quartznet/quartznet_15x5_zh.yaml"
yaml = YAML(typ='safe')
with open(config_path) as f:
params = yaml.load(f)
print(params['model']['train_ds']['manifest_filepath'])
print(params['model']['validati