如何利用大模型将语音转文字
OpenAI支持将语音转文字,调用接口可以直接将语音文件转为文字。
这个例子是调用了一个私有部署的Belle-whisper-large-v2-zh
,使用OpenAI
的客户端。
测试代码如下:
from openai import OpenAIclient = OpenAI(base_url='http://127.0.0.1:9922/v1',api_key='EMPTY'
)models = client.models.list()print(models)
可以从下面的输出中,确认大模型的名称。
SyncPage[Model](data=[Model(id='Belle-whisper-large-v2-zh', created=0, object='model',
owned_by='xinference', model_type='audio', address='0.0.0.0:36445', accelerators=['0'],
model_name='Belle-whisper-large-v2-zh', model_family='whisper', model_revision='ec5bd5d78598545b7585814edde86dac2002b5b9', replica=1),Model(id='bge-reranker-large', created=0, object='model', owned_by='xinference', model_type='rerank', address='0.0.0.0:46201', accelerators=['0'], type='normal', model_name='bge-reranker-large', language=['en', 'zh'], model_revision='v0.0.1', replica=1), Model(id='bge-base-zh-v1.5', created=0, object='model', owned_by='xinference', model_type='embedding', address='0.0.0.0:40537', accelerators=['0'], model_name='bge-base-zh-v1.5', dimensions=768, max_tokens=512, language=['zh'], model_revision='v0.0.1', replica=1)], object='list')
选择一个声音文件,将文件内容提交给大模型。
file_name = r'C:\Temp\四年级英语听力.mp3'audio_file = open(file_name, 'rb')transcription = client.audio.transcriptions.create(model="Belle-whisper-large-v2-zh",file=audio_file
)print(transcription.text)
输出结果为:
四年级英语听力部分ALookListenandChoose听音选图 writing he is a famous writer to Galway's brother is
a policeman Galway's brother is a policeman three this is my classmate Li Yan she's good at reading books
this is my classmate Li Yan she is good at reading books My uncle is a taxi driver. He drives well
听录音填写 I'm eleven She is twelve We are in the same class Her father is a teacher Her mother i
s a TV reporter of class two grade five.听录音用钩叉判断 I'm a new student I'm in class 2 five.
Here is a picture of my family. This is my father. He's a writer. This is my mother. She's a singer.
The girl is my sister. The boy is me. We love our father and mother and they love us.
We are a happy family听力结束请同学们继续答题
看上去还不错。