AI技术

常用开源音频技术推荐

文本转语音（TTS）

PaddleSpeech

https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/README_cn.md

百度，支持语音、文字多场景互转，且支持流式。

XTTS（推荐）

https://github.com/coqui-ai/TTS

效果还可以，一个集成工具，实现了多个模型，tacotron2，bark， FastSpeech2等。

体验：https://huggingface.co/spaces/coqui/CoquiTTS

ChatTTS

https://github.com/2noise/ChatTTS

EdgeTTS

https://github.com/rany2/edge-tts

CosyVoice

https://github.com/FunAudioLLM/CosyVoice

Vits

https://github.com/jaywalnut310/vits

SpeechT5

https://github.com/microsoft/SpeechT5

英文还可以。

体验：https://huggingface.co/spaces/Matthijs/speecht5-tts-demo

Bark

https://github.com/suno-ai/bark

英文还可以，中文有老外口音。同样可以用于音乐和声音克隆。

体验：https://huggingface.co/spaces/suno/bark

Real-Time-Voice-Cloning

https://github.com/CorentinJ/Real-Time-Voice-Cloning

TTS-Vue

https://github.com/LokerL/tts-vue

MetaVoiceIO

https://github.com/metavoiceio/metavoice-src

EmotiVoice

https://github.com/netease-youdao/EmotiVoice

多音色带语气情感，网易出品。

文档：https://github.com/netease-youdao/EmotiVoice/blob/main/README.zh.md

小白安装教程：https://github.com/netease-youdao/EmotiVoice/blob/main/README_%E5%B0%8F%E7%99%BD%E5%AE%89%E8%A3%85%E6%95%99%E7%A8%8B.md

声音列表：https://github.com/netease-youdao/EmotiVoice/tree/main/data/youdao/text

EmotiVoice-Plus

https://aiyy.info/emotivoice-plus/

基于EmotiVoice做的一个多人转语音的工具。

MockingBird

https://github.com/babysor/MockingBird

Sambert（推荐）

https://modelscope.cn/models/speech_tts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/summary

支持中文和英语，效果还可以，体验：https://modelscope.cn/studios/damo/personal_tts/summary

GPT-SoVITS

https://github.com/RVC-Boss/GPT-SoVITS

语音转文字（STT）

PaddleSpeech

https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/README_cn.md

百度，支持语音、文字多场景互转，且支持流式。

Whisper

https://github.com/openai/whisper

支持多语言。

体验

https://huggingface.co/spaces/innev/whisper-Base

https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11

SenseVoice

https://github.com/FunAudioLLM/SenseVoice

效果比Whisper好。

FastWhisper（推荐）

https://github.com/guillaumekln/faster-whisper

Whisper.cpp

https://github.com/ggerganov/whisper.cpp

加速Whisper

DeepSpeech

https://github.com/mozilla/DeepSpeech

https://github.com/SeanNaren/deepspeech.pytorch

Espnet

https://github.com/espnet/espnet

声音克隆

此处最后MockingBird, Sambert, GPT-SoVITS模型和文本转语音模型有重复。

Bert-vits2（推荐）

https://github.com/fishaudio/Bert-VITS2

Fish-speech

https://github.com/fishaudio/fish-speech

Vits

https://github.com/Plachtaa/VITS-fast-fine-tuning

Retrieval-based-Voice-Conversion-WebUI

https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI

语音转语音，支持声音克隆，微调，可实时变声，作为变声器。

MockingBird

https://github.com/babysor/MockingBird

Sambert（推荐）

https://modelscope.cn/models/speech_tts/speech_sambert-hifigan_tts_zh-cn_multisp_pretrain_16k/summary

GPT-SoVITS

https://github.com/RVC-Boss/GPT-SoVITS

教程参考：GPT-SoVITS音频处理及语音克隆

基于GPT-SoVITS的声音克隆教程：https://www.bilibili.com/video/BV1P541117yn/?spm_id_from=333.337.search-card.all.click&vd_source=b97a538c390a5dab96d947934fc1119a

BarkVoiceCloning

基于bark进行改造的声音克隆

https://github.com/KevinWang676/Bark-Voice-Cloning

AI唱歌

So-vits-svc

https://github.com/svc-develop-team/so-vits-svc

已经归档不更新了，最新版本是4.1

So-vits-svc-5.0（推荐）

https://github.com/PlayVoice/so-vits-svc-5.0

音乐生成

Muzic

https://github.com/microsoft/muzic

AudioCraft

https://github.com/facebookresearch/audiocraft

音频调音

SoundTouch

https://github.com/imtaotao/sound-touch

如果觉得文章对你有用，请随意赞赏

Audio

常用开源音频技术推荐

文本转语音（TTS）

PaddleSpeech

XTTS（推荐）

ChatTTS

EdgeTTS

CosyVoice

Vits

SpeechT5

Bark

Real-Time-Voice-Cloning

TTS-Vue

MetaVoiceIO

EmotiVoice

EmotiVoice-Plus

MockingBird

Sambert（推荐）

GPT-SoVITS

语音转文字（STT）

PaddleSpeech

Whisper

SenseVoice

FastWhisper（推荐）

Whisper.cpp

DeepSpeech

Espnet

声音克隆

Bert-vits2（推荐）

Fish-speech

Vits

Retrieval-based-Voice-Conversion-WebUI

MockingBird

Sambert（推荐）

GPT-SoVITS

BarkVoiceCloning

AI唱歌

So-vits-svc

So-vits-svc-5.0（推荐）

音乐生成

Muzic

AudioCraft

音频调音

SoundTouch

评论