CosyVoice:开源强大的 AI 语音合成工具

news/2025/1/15 15:13:31/

在当今科技飞速发展的时代,AI 语音合成技术正逐渐改变着我们的生活。今天,就为大家介绍一款卓越的语音合成工具——CosyVoice。
A 3D rendering of the "CosyVoice" logo. The logo features a rounded font in pastel shades of pink, blue, and purple. The name is adornedwith stars, pink hearts, and a crown. The logo has a fun and youthful aesthetic. a microphone on left.The background is a soft gradient. This logo is perfect for romantic and youthful projects, photography, illustration, 3D rendering, typography, cinematic visuals, anime, fashion, and more.

一、安装步骤

  1. 克隆和安装
    • 克隆仓库:git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git。如果克隆子模块失败,可以运行命令cd CosyVoice; git submodule update --init --recursive
  2. 安装 Conda:请参考https://docs.conda.io/en/latest/miniconda.html。
  3. 创建 Conda 环境
    • conda create -n cosyvoice python=3.8
    • conda activate cosyvoice
    • conda install -y -c conda-forge pynini==2.1.5
    • pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
  4. 解决 sox 兼容性问题
    • Ubuntu:sudo apt-get install sox libsox-dev
    • CentOS:sudo yum install sox sox-devel

二、模型下载

强烈建议下载预训练的CosyVoice - 300MCosyVoice - 300M - SFTCosyVoice - 300M - Instruct模型和CosyVoice - ttsfrd资源。

  1. SDK 模型下载
    from modelscope import snapshot_download
    snapshot_download('iic/CosyVoice - 300M', local_dir='pretrained_models/CosyVoice - 300M')
    snapshot_download('iic/CosyVoice - 300M - SFT', local_dir='pretrained_models/CosyVoice - 300M - SFT')
    snapshot_download('iic/CosyVoice - 300M - Instruct', local_dir='pretrained_models/CosyVoice - 300M - Instruct')
    snapshot_download('iic/CosyVoice - ttsfrd', local_dir='pretrained_models/CosyVoice - ttsfrd')
    
  2. git 模型下载(确保已安装 git lfs):
    mkdir -p pretrained_models
    git clone https://www.modelscope.cn/iic/CosyVoice - 300M.git pretrained_models/CosyVoice - 300M
    git clone https://www.modelscope.cn/iic/CosyVoice - 300M - SFT.git pretrained_models/CosyVoice - 300M - SFT
    git clone https://www.modelscope.cn/iic/CosyVoice - 300M - Instruct.git pretrained_models/CosyVoice - 300M - Instruct
    git clone https://www.modelscope.cn/iic/CosyVoice - ttsfrd.git pretrained_models/CosyVoice - ttsfrd
    
  3. 可选步骤:解压ttsfrd资源并安装ttsfrd包以获得更好的文本归一化性能,但这不是必需的。若不安装,将默认使用WeTextProcessing
    cd pretrained_models/CosyVoice - ttsfrd/
    unzip resource.zip -d.
    pip install ttsfrd - 0.3.6 - cp38 - cp38 - linux_x86_64.whl
    

三、基本用法

  1. 对于不同的推理需求选择不同的模型:
    • 零样本/跨语言推理,请使用CosyVoice - 300M模型。
    • SFT 推理,请使用CosyVoice - 300M - SFT模型。
    • 指令推理,请使用CosyVoice - 300M - Instruct模型。
  2. 首先,将third_party/Matcha - TTS添加到PYTHONPATH
    export PYTHONPATH=third_party/Matcha - TTS
    
  3. 示例代码:
    from cosyvoice.cli.cosyvoice import CosyVoice
    from cosyvoice.utils.file_utils import load_wav
    import torchaudiocosyvoice = CosyVoice('pretrained_models/CosyVoice - 300M - SFT')
    # sft usage
    print(cosyvoice.list_avaliable_spks())
    # change stream=True for chunk stream inference
    for i, j in enumerate(cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女', stream=False)):torchaudio.save('sft_{}.wav'.format(i), j['tts_speech'], 22050)cosyvoice = CosyVoice('pretrained_models/CosyVoice - 300M')
    # zero_shot usage, <|zh|><|en|><|jp|><|yue|><|ko|> for Chinese/English/Japanese/Cantonese/Korean
    prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
    for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], 22050)
    # cross_lingual usage
    prompt_speech_16k = load_wav('cross_lingual_prompt.wav', 16000)
    for i, j in enumerate(cosyvoice.inference_cross_lingual('<|en|>And then later on, fully acquiring that company. So keeping management in line, interest in line with the asset that\'s coming into the family is a reason why sometimes we don\'t buy the whole thing.', prompt_speech_16k, stream=False)):torchaudio.save('cross_lingual_{}.wav'.format(i), j['tts_speech'], 22050)cosyvoice = CosyVoice('pretrained_models/CosyVoice - 300M - Instruct')
    # instruct usage, support <laughter></laughter><strong></strong>[laughter][breath]
    for i, j in enumerate(cosyvoice.inference_instruct('在面对挑战时,他展现了非凡的<strong>勇气</strong>与<strong>智慧</strong>。', '中文男', 'Theo \'Crimson\', is a fiery, passionate rebel leader. Fights with fervor for justice, but struggles with impulsiveness.', stream=False)):torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], 22050)
    

四、启动 Web 演示

可以使用 Web 演示页面快速熟悉 CosyVoice,支持 sft/零样本/跨语言/指令推理。具体详情请参考演示网站。
示例命令:python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice - 300M(可根据需要更改模型)。

五、高级用法

对于高级用户,examples/libritts/cosyvoice/run.sh中提供了训练和推理脚本,可以按照此示例熟悉 CosyVoice。

六、构建用于部署

若要使用 grpc 进行服务部署,可执行以下步骤,否则可忽略此步骤。

  1. 构建 docker 镜像:
    cd runtime/python
    docker build -t cosyvoice:v1.0.
    
  2. 运行 docker 容器(根据需要选择推理模式):
    • grpc 用法
      docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python/grpc && python3 server.py --port 50000 --max_conc 4 --model_dir iic/CosyVoice - 300M && sleep infinity"
      cd grpc && python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>
      
    • fastapi 用法
      docker run -d --runtime=nvidia -p 50000:50000 cosyvoice:v1.0 /bin/bash -c "cd /opt/CosyVoice/CosyVoice/runtime/python/fastapi && python3 server.py --port 50000 --model_dir iic/CosyVoice - 300M && sleep infinity"
      cd fastapi && python3 client.py --port 50000 --mode <sft|zero_shot|cross_lingual|instruct>
      

CosyVoice 以其强大的功能和灵活的使用方式,为我们带来了全新的语音合成体验。快来尝试吧!


http://www.ppmy.cn/news/1521931.html

相关文章

滑块验证是否人机

效果图&#xff1a; 原理&#xff1a; 使用阿里第三方验证插件js生成滑块&#xff0c;默认获取验证码按钮为不可点击属性 .getyzm{pointer-events: none;cursor: default;} 再添置一个可点击属性的类 .getyzmok{color: #000000 !important;pointer-events: visible;} 当滑块滑动…

Elasticsearch检索原理

Elasticsearch 的检索原理主要基于其内部使用的倒排索引结构&#xff0c;以及诸如BM25等相关性评分算法。 查询解析 当用户提交查询时&#xff0c;Elasticsearch 接收和解析该请求&#xff0c;包括确定查询类型&#xff08;如Match、Bool、Term等&#xff09;和相关字段。解析…

vsstudio2019,windows平台,使用DeviceIOControl向大容量存储设备发起SCSI通信,读写其扇区,绕过文件系统的排查;

源码&#xff1a; 电脑插入U盘&#xff0c;为物理驱动器3 如下使用DeviceIOControl发送MSC类规定的SCSI通信指令中 读指令&#xff08;0x28&#xff09; 指定读0扇区&#xff0c;读1长度的扇区&#xff0c;一共长度为512字节 #include <windows.h> #include <std…

SprinBoot+Vue停车场管理微信小程序的设计与实现

目录 1 项目介绍2 项目截图3 核心代码3.1 Controller3.2 Service3.3 Dao3.4 application.yml3.5 SpringbootApplication3.5 Vue3.6 uniapp代码 4 数据库表设计5 文档参考6 计算机毕设选题推荐7 源码获取 1 项目介绍 博主个人介绍&#xff1a;CSDN认证博客专家&#xff0c;CSDN平…

Maven 深入指南:构建自动化与项目管理的艺术

目录 1.引言 2.Maven 的核心概念 2.1 POM&#xff08;Project Object Model&#xff09; 2.2 依赖管理 2.3 生命周期 2.4 插件和目标 3.Maven 的安装与配置 3.1 安装 Maven 3.2 配置 settings.xml 4.Maven 的使用 4.1 创建项目 4.2 构建项目 4.3 运行测试 4.4 部…

exceljs操作手册

ExcelJS 读取&#xff0c;操作并写入电子表格数据和样式到 XLSX 和 JSON 文件。 一个 Excel 电子表格文件逆向工程项目。 安装 npm install exceljs新的功能! Merged fix: styles rendering in case when “numFmt” is present in conditional formatting rules (resolves…

计算机网络 数据链路层2

ALOHA:想发就发 CSMA 载波监听多路访问协议 CS&#xff1a;载波监听&#xff0c;在发送数据之前检测总线上是否有其他计算机在发送数据 1-坚持CSMA:主机想发送消息&#xff0c;需要监听信道&#xff1b; 信道空闲则直接传输信息&#xff1b; 信道忙碌则一直监听&#xff0c;直…

腾讯云 Spring Boot 安装 SSL 证书

linux和windows下&#xff0c;因为有ngxin&#xff0c;所以安装ssl证书都感觉比较容易&#xff0c;毕竟通过代理方式能够胜任大多数的https安全问题。 但是有些情况下&#xff0c;ngxin可能无法安装什么的&#xff0c;可能需要在spring boot下直接安装ssl&#xff0c;咋办&…