文本转语音(TTS)

PaddleSpeech

https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/README_cn.md

百度,支持语音、文字多场景互转,且支持流式。

XTTS(推荐)

https://github.com/coqui-ai/TTS

效果还可以,一个集成工具,实现了多个模型,tacotron2,bark, FastSpeech2等。

体验:https://huggingface.co/spaces/coqui/CoquiTTS

ChatTTS

https://github.com/2noise/ChatTTS

EdgeTTS

https://github.com/rany2/edge-tts

CosyVoice

https://github.com/FunAudioLLM/CosyVoice

Vits

https://github.com/jaywalnut310/vits

SpeechT5

https://github.com/microsoft/SpeechT5

英文还可以。

体验:https://huggingface.co/spaces/Matthijs/speecht5-tts-demo

Bark

https://github.com/suno-ai/bark

英文还可以,中文有老外口音。同样可以用于音乐和声音克隆。

体验:https://huggingface.co/spaces/suno/bark

Real-Time-Voice-Cloning

https://github.com/CorentinJ/Real-Time-Voice-Cloning

TTS-Vue

https://github.com/LokerL/tts-vue

MetaVoiceIO

https://github.com/metavoiceio/metavoice-src

EmotiVoice

https://github.com/netease-youdao/EmotiVoice

多音色带语气情感,网易出品。

文档:https://github.com/netease-youdao/EmotiVoice/blob/main/README.zh.md

小白安装教程:https://github.com/netease-youdao/EmotiVoice/blob/main/README_%E5%B0%8F%E7%99%BD%E5%AE%89%E8%A3%85%E6%95%99%E7%A8%8B.md

声音列表:https://github.com/netease-youdao/EmotiVoice/tree/main/data/youdao/text

EmotiVoice-Plus

https://aiyy.info/emotivoice-plus/

基于EmotiVoice做的一个多人转语音的工具。

MockingBird

https://github.com/babysor/MockingBird

Sambert(推荐)

https://modelscope.cn/models/speech_tts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/summary

支持中文和英语,效果还可以,体验:https://modelscope.cn/studios/damo/personal_tts/summary

GPT-SoVITS

https://github.com/RVC-Boss/GPT-SoVITS

语音转文字(STT)

PaddleSpeech

https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/README_cn.md

百度,支持语音、文字多场景互转,且支持流式。

Whisper

https://github.com/openai/whisper

支持多语言。

体验

https://huggingface.co/spaces/innev/whisper-Base

https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11

SenseVoice

https://github.com/FunAudioLLM/SenseVoice

效果比Whisper好。

FastWhisper(推荐)

https://github.com/guillaumekln/faster-whisper

Whisper.cpp

https://github.com/ggerganov/whisper.cpp

加速Whisper

DeepSpeech

https://github.com/mozilla/DeepSpeech

https://github.com/SeanNaren/deepspeech.pytorch

Espnet

https://github.com/espnet/espnet

声音克隆

此处最后MockingBird, Sambert, GPT-SoVITS模型和文本转语音模型有重复。

Bert-vits2(推荐)

https://github.com/fishaudio/Bert-VITS2

Fish-speech

https://github.com/fishaudio/fish-speech

Vits

https://github.com/Plachtaa/VITS-fast-fine-tuning

Retrieval-based-Voice-Conversion-WebUI

https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI

语音转语音,支持声音克隆,微调,可实时变声,作为变声器。

MockingBird

https://github.com/babysor/MockingBird

Sambert(推荐)

https://modelscope.cn/models/speech_tts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/summary

GPT-SoVITS

https://github.com/RVC-Boss/GPT-SoVITS

教程参考:GPT-SoVITS音频处理及语音克隆

基于GPT-SoVITS的声音克隆教程:https://www.bilibili.com/video/BV1P541117yn/?spm_id_from=333.337.search-card.all.click&vd_source=b97a538c390a5dab96d947934fc1119a

BarkVoiceCloning

基于bark进行改造的声音克隆

https://github.com/KevinWang676/Bark-Voice-Cloning

AI唱歌

So-vits-svc

https://github.com/svc-develop-team/so-vits-svc

已经归档不更新了,最新版本是4.1

So-vits-svc-5.0(推荐)

https://github.com/PlayVoice/so-vits-svc-5.0

音乐生成

Muzic

https://github.com/microsoft/muzic

AudioCraft

https://github.com/facebookresearch/audiocraft

音频调音

SoundTouch

https://github.com/imtaotao/sound-touch