语音合成工具

语音合成工具_bark

news/2024/11/17 15:54:46/

1 介绍

多语言的文字转语音模型。
地址: https://github.com/suno-ai/bark

2 模型原理

Bark通过三个Transformer模型，将文本转换为音频。

2.1 文本到语义Token

输入：由Hugging Face的BERT标记器分词的文本
输出：编码生成音频的语义Token

2.2 语义到粗略Token

输入：语义Token
输出：来自Facebook的EnCodec编解码器的前两个codebooks的Token

2.3 粗略到细节Token

输入：EnCodec的前两个codebooks
输出：EnCodec的8个codebooks

3 使用方法

3.1 环境配置

docker pull pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime

运行docker

nvidia-docker run -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all -p 8893:8888 -v /raid/:/opt/raid --gpus all --rm -it pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime bash

3.2 安装 bark

进入docker后：

# 安装 bark
git clone https://github.com/suno-ai/bark
cp /xxx/pip.conf /root/.pip/
export http_proxy=http://192.168.1.22:xxxx
export https_proxy=http://192.168.1.22:xxxx
cd bark
python setup.py install# 安装 jupyter
pip install jupyter_nbextensions_configurator jupyter_contrib_nbextensions
jupyter notebook --allow-root -y --no-browser --ip=0.0.0.0

3.3 测试

设置环境变量：

import os
os.environ['SUNO_USE_SMALL_MODELS'] = 'True'
os.environ['XDG_CACHE_HOME'] = 'set local path to save models' 
# default path: /USER_DIR/.cache/suno/bark_v0

合成语音：

from bark import SAMPLE_RATE, generate_audio, preload_models
from IPython.display import Audio# download and load all models
preload_models()# generate audio from text
text_prompt = """我要试试能不能合成中文
"""
audio_array = generate_audio(text_prompt)# play text in notebook
Audio(audio_array, rate=SAMPLE_RATE)