Image that reads free open-source ai voice generators.

Elevenlabs is a great AI voice generator but it comes with a hefty price tag and a fairly barebones user interface.
Elevenlabs 是一个出色的人工智能语音生成器,但它价格昂贵,用户界面也比较简陋。

Here, you will find unlimited & free Elevenlab alternatives:
在这里,你可以找到无限的及免费的 Elevenlab 替代品:

  • Free Text-to-Speech tools plus open-source options
  • Fast and unlimited voice generation
  • Easy to setup (no-code or low-code)

Here my top 3 free AI voice generators. Keep reading for a detailed overview.
以下是我要推荐的三大免费 AI 语音生成器。继续阅读以获取详细概述。

Tool 工具Best For 最佳选择
Coqui TTS 科 QUI TTSGeneral purpose voice generation; fantasy screenplays; their xtts-v2 (huggingface link) model reaches 11lab quality in voice cloning; each model comes with its own usage terms (XTTS is non-commercial)
通用语音生成;奇幻剧本;他们的 xtts-v2(huggingface 链接)模型在声音克隆方面达到 11lab 的质量标准;每个模型都有自己的使用条款(XTTS 为非商业用途)
Mycroft Mimic3 我的克罗伊特模拟器 3Personal voice assistant; Works offline
Tortoise 乌龟Best quality but slow (Alternative: Playht turbo, a faster freemium TTS)
最佳质量但速度慢(替代方案:Playht 涡轮,更快的免费 TTS)
Top 3 Elevenlabs Alternatives
最佳 Elevenlabs 替代方案前 3 名

For those who prioritize user interfaces and additional features, and don’t mind exploring plans that might come with costs, check out our roundup of the Best AI Voice Generators with free plans available.
对于重视用户界面和额外功能,不介意考虑可能带有一些费用计划的人来说,可以查看我们整理的最佳 AI 语音生成器列表,其中包含免费计划。

If you are okay with writing python code, open AI TTS is 6x cheaper than Eleven Labs and just as good:
如果你不介意编写 Python 代码,那么 Open AI TTS 比 Eleven Labs 便宜 6 倍,而且效果一样好:

1. Coqui TTS 1. 科 QUI TTS

Meet Coqui TTS. It’s a simple tool that helps you turn text into speech. You can start for free with its Python library which supports 100s of TTS models.
认识一下 Coqui TTS。这是一个简单的工具,可以帮助你将文本转化为语音。你可以免费开始使用它的 Python 库,该库支持数百种 TTS 模型。

Image shows coquitts platform

Key Features 主要特性

  • Easy to use: Available as a free python library, and paid API and webapp.
    简单易用:提供免费的 Python 库,以及付费的 API 和网页应用。
  • Multilingual: Supports 13 languages.
    支持 13 种语言的多语言功能。
  • Multi-speaker TTS: Add multiple characters to voiceover.
    多说话者 TTS:为配音添加多个角色。
  • Advanced timeline editor: Adjust pitch, loudness and emotions, for each sentence, word or character.
  • Voice cloning: Clone any voice from 3 seconds of audio and add to your collection.
    语音克隆:仅需 3 秒的音频即可克隆任何声音并将其添加到您的收藏中。
  • Prompt 2 Voice: Generate voices from prompt.
    提示 2 语音:根据提示生成语音。
  • Support for large number of TTS models including:
    支持大量 TTS 模型,包括:
    • Tortoise 乌龟
    • Bark 翻译文本:狗吠声
    • Tacotron
    • Fastspeech and more. 快速语音和更多。
    • xtts-v2

Note: The Coqui code is released under the MPL license. What does this mean? The TTS code and models have explicit licenses. TTS as a code base is under MPL2.0 (allows commercial use) and each model has its own license (may not allow commercial use). The model creator chooses the license.
注意:Coqui 代码是根据 MPL 许可证发布的。这意味着什么?TTS 代码和模型有明确的许可证。TTS 代码库本身遵循 MPL 2.0 许可证(允许商业使用),而每个模型都有自己的许可证(可能不允许商业使用)。模型创建者自行选择许可证。

Example: models from Meta are under a Creative Commons non-commercial license. But the XTTS Model does not allow you to use it commercially without paying for a license. 😢You can buy a commercial use license from Coqui.
示例:Meta 的模型采用创意共享非商业许可。但 XTTS 模型不允许你在未购买许可证的情况下进行商业使用。😢你可以从 Coqui 购买商业使用许可证。

Pros 优点

  • Easy-to-use colab notebook.
    易于使用的 Colab 笔记本。
  • Multiple emotional tones and styles
  • You can generate your own voices from text prompts plus fuse two voices using Voice fusion.
  • Voice cloning is fast and high quality.
  • Best voices for fantasy/storytelling use cases.

Cons 缺点

  • Commercial license for XTTS model is paid.
    XTTS 模型的商业许可证是付费的。

2. Bark by 源文:2. 的 Bark 翻译结果:2. 的 Bark

Bark is like your personal studio for creating voices and music. You don’t need to pay anything to get started.
Bark 是你的个人声音和音乐创作工作室,开始使用完全不需要付费。

Key Features 主要特性

  • Lots of Choices: Over 100 voice presets to pick from, plus new ones from other users on Discord.
    多种选择:超过 100 种语音预设可供选择,还有来自 Discord 上其他用户的新增预设。
  • Smart Language Handling: Bark can handle texts in many languages, even if they’re mixed together.
    智能语言处理:Bark 可以处理多种语言的文本,即使它们混合在一起。
  • Sings Too: Not just talk—Bark can create singing voices.
    不仅会说—Bark 还能创造歌唱声音。

Pros 优点

  • Sounds Real: Whether it’s speaking in different languages or making music, Bark sounds like the real thing.
    听起来很真实:无论是说不同语言还是制作音乐,Bark 的声音都像真的一样。
  • Expressive: It can laugh, sigh, cry—just like a person.
  • Commercial Use: You can use Bark for your projects, even to make money.
    商业使用:您可以将 Bark 用于您的项目,甚至可以赚钱。
  • Community Support: Join the Discord to meet others and find new voice presets.
    社区支持:加入 Discord 以结识他人并发现新的语音预设。
  • Big Library: There’s a huge collection of voice prompts to explore.

Cons 缺点

  • No Web App Yet: You’ll need to use Colab or Discord to try it out for now, but it’s still free and easy.
    目前还没有网页应用:你现在需要使用 Colab 或 Discord 来尝试,但仍然是免费和方便的。

3. Tortoise TTS 3. 乌龟文本转语音

Tortoise TTS is all about making text sound as natural as it can get. It’s a text-to-speech model that James Betker designed to make voices that sound really true-to-life.
乌龟 TTS 的目标是使文本听起来尽可能自然。这是一个由 James Betker 设计的文本转语音模型,旨在创造出真正逼真的声音。

Key Features 主要特点

  • High Fidelity Voice Cloning: Create voices that sound just like the input audio sample.
  • Realistic AI Voiceovers: Make your text come to life with voices that are hard to tell apart from real humans.
    逼真的 AI 语音:让您的文本通过难以与真人区别的声音栩栩如生。
  • Elevenlab reportedly uses a fine-tuned clone of Tortoise TTS.
    据报道,Elevenlab 使用了 Tortoise TTS 的微调克隆版本。

Pros 优点

  • Top-Notch Voices: The voices you can create are super clear and sound great.
  • Master at Cloning: It’s really good at making new voices from just a small bit of audio from someone. This is perfect for making lots of different voices, even famous ones.
  • Quality Voices: The voices you make with it are of very high quality.
  • Control How It Speaks: You can adjust how the voice talks—its tone, feeling, speed, and more—by changing the text prompt you give it (Like typing “I am sad” in text makes the ai voice sound sadder).
    控制发音方式:你可以通过改变给它的文本提示来调整声音的语气、情感、速度等(比如在文本中输入“我很伤心”会让 AI 语音听起来更悲伤)。

Cons 缺点

  • Just English: Right now, it can only make voices in English and can’t make sound effects.
  • It can be tough to get it set up and it’s pretty slow.

James stopped working on Tortoise (at least in public) in view of ethical considerations (the model is really good, and he fears it may be used for fraud if optimized for faster output).
詹姆斯出于道德考虑(模型非常有效,他担心如果优化以提高输出速度,可能会被用于欺诈)而停止了 Tortoise 的工作(至少在公开场合是这样)。

But it is still a good model to try out and to read through the code from an engineering standpoint.

4. Playground 4. 游乐场 gives you a world of voices—907 AI voices in 142 languages and accents. It’s great for reaching a wide audience, from local dialects to global languages. 为您提供了一个充满声音的世界——142 种语言和口音的 907 个人工智能声音。这非常适合接触广泛的受众,从地方方言到全球语言。

Including this because at time of writing their free plan is pretty generous. But this is not open source.

Key Features 主要特性

  • Lots of Voices: has a huge library of 907 AI voices that cover 142 languages and accents. That means you can find the perfect voice for any audience, including local languages like Malayalam and Telugu.
    众多声音 拥有庞大的 907 种 AI 语音库,涵盖 142 种语言和口音。这意味着您可以为任何受众找到完美的声音,包括马拉雅拉姆语和泰卢固语等地方语言。
  • Just Like Real: The voices are made to sound just like a person’s voice. This is great for when you want someone to listen to your audiobook or learn something new and feel like someone real is talking to them.
  • Pick Your Voice Style: No matter what you’re making—a news report, a chat with customers, or anything else—there’s a voice style ready for you. You can choose from styles like Newscaster, Conversational, or Customer Support, among others.
  • Clone Voices Well: If you need a voice that sounds like a specific person, you can clone it with This is an extra feature you can add on, and it does a really good job of copying voices.
    克隆声音出色:如果您需要一个听起来像特定人的声音,可以通过 进行克隆。这是一个可额外添加的功能,它在复制声音方面做得非常好。
  • SEO-Optimized Audio Articles: Enhance your website’s accessibility and search engine presence by converting text articles into audio formats using’s convenient audio widget.
    SEO 优化的音频文章:使用 的便捷音频小工具将文本文章转化为音频格式,提高网站的可访问性和搜索引擎排名。
  • Custom Pronunciation Library: Address the common issue of mispronunciation by voice generators by building a custom pronunciation guide within, ensuring your audio content sounds just right.
    自定义发音库:通过在 中构建自定义发音指南,解决语音生成器常见的发音问题,确保您的音频内容听起来恰到好处。
  • Direct Podcast Distribution: Streamline your workflow by distributing your audio directly to popular platforms such as iTunes, Spotify, and Google Podcasts from the dashboard, eliminating the need for multiple upload/download steps.
    直接播客分发:通过 仪表板直接将音频分发到 iTunes、Spotify 和 Google 播客等流行平台,简化工作流程,避免多次上传/下载步骤。

Pros 优点

  • Precision in Pronunciation: It excels in accurately pronouncing technical words and acronyms, making it pretty useful for educational content.
  • Generous Free Tier: Dip your toes in with a free plan that includes 2500 words.
    慷慨的免费层:您可以使用包括 2500 个单词的免费计划初步体验。
  • Word Limit Flexibility: With basic plans offering 3 million characters per year, you won’t easily run out of capacity.
    字数限制灵活性:基础计划提供每年 300 万字符,您不会轻易用完容量。
  • Authenticity in Voices: The ultra-realistic voices are fine-tuned to closely mimic human intonation and emotion.
  • Multilingual Voice Cloning: Not only does it clone voices, but it does so across multiple languages, a feature not commonly found elsewhere.
  • Diverse Language Support: Extensive collection of non-English language options, like Hindi.

Cons 缺点

  • The starting plan is at $30, which might be steep for users with minimal voiceover needs.
    起始计划为 30 美元,对于只需少量配音的用户来说,可能会有些昂贵。

5. Mycroft Mimic 3

Mimic 3 is a tool that respects your privacy and is completely open-source, which means anyone can use or modify it.
Mimic 3 是一个尊重您隐私并且完全开源的工具,这意味着任何人都可以使用或修改它。

It’s a neural Text to Speech (TTS) engine, designed to deliver high-quality voice output that you can use right from your own devices, without needing an internet connection.

They’re also working on a cloud version for those who prefer simplicity or have devices with limited processing power.

Pros 优点

  • Quality Voices: The voice output is clear and natural-sounding.
  • It can run completely offline.
  • Suitable for low-end hardware.

Cons 缺点

  • Voices are not very expressive.
Try Mycroft Mimic 尝试使用 Mycroft Mimic

6. silero-models 6. silero 模型

Silero Models offers pre-trained models that make Speech-to-Text (STT) and Text-to-Speech (TTS) tasks straightforward for businesses.
Silero Models 提供预先训练好的模型,使企业能够轻松地进行语音转文本(STT)和文本转语音(TTS)任务。

They pride themselves on providing STT services that are on par with, and sometimes even surpass, the quality of Google’s offerings, all without the complexity typically associated with such technology.
他们自豪地提供与谷歌服务相当,有时甚至超越其质量的 STT 服务,而且没有通常与这种技术相关的复杂性。

Key Features 主要特点

  • High-Quality STT: Their Speech to Text is refreshingly easy to use—just check their benchmarks (on Github) to see how they stack up against the competition.
    高质量的 STT:他们的语音转文本使用起来非常简单,只需查看他们在 GitHub 上的基准测试,就可以看到他们与竞争对手的对比情况。
  • Hassle-Free TTS: Silero provides Text to Speech models that are ready to use with just one line of code, boasting a broad selection of voices and a simple, dependency-free setup.
    无麻烦的 TTS:Silero 提供了只需一行代码即可使用的文本转语音模型,拥有广泛的语音选择和简单的、无需依赖的设置。
  • Efficient and Fast: These models are optimized for speed, running faster than real-time speech on a single CPU thread, with support for both 16kHz and 8kHz audio.
    高效且快速:这些模型经过优化,单个 CPU 线程运行速度超过实时语音,支持 16kHz 和 8kHz 音频。

Pros 优点

  • No Complex Setup: You won’t need to deal with Kaldi, compilations, or lengthy instructions to get started.
    无需复杂设置:您无需处理 Kaldi、编译或繁琐的说明即可开始使用。
  • High-Performance Speech: The end-to-end pipeline ensures the speech sounds natural, and you don’t need a GPU or any training to begin.
    高性能语音:端到端的流程确保语音听起来自然,您无需 GPU 或任何训练即可开始使用。
  • Language Support: It supports Russian, English, German, and Spanish, and has the potential to be extended further.
  • Text Readability: Their model can insert punctuation and capitalization effectively, making texts more readable.

7. MockingBird MockingBird

MockingBird is a Python-based project that specializes in cloning voices quickly—just 5 seconds—and enables the generation of speech in real time. It’s built for working with Chinese, providing seamless real-time voice cloning.
MockingBird 是一个基于 Python 的项目,专长于快速克隆声音——只需 5 秒——并能够实时生成语音。它适用于中文,提供无缝的实时语音克隆。

Key Features 主要特性

  • Chinese Language Support: Works with Mandarin and tested on several datasets (aidatatang_200zh, magicdata, aishell3, data_aishell).
  • PyTorch Compatibility: Good to go with PyTorch 1.9.0 and performs well on NVIDIA GPUs like Tesla T4 and GTX 2060.
    PyTorch 兼容性:与 PyTorch 1.9.0 兼容良好,并在 NVIDIA GPU,如 Tesla T4 和 GTX 2060 上表现出色。
  • Cross-Platform Functionality: Runs on Windows, Linux, and even M1 MACOS.
    跨平台功能:可在 Windows、Linux 甚至 M1 MacOS 上运行。
  • User-Friendly and High-Quality: Easy to get started with just a new synthesizer training, using a pre-trained encoder/vocoder for quality voice cloning.
  • Webserver Integration: Ready to roll for online use, letting you serve up voice clones and manage them remotely.

MockingBird stands out for its rapid voice cloning capability, particularly for Chinese Mandarin, and its ease of use across different platforms and technologies.
MockingBird 以其快速的语音克隆能力,特别是对于中文普通话,以及在不同平台和技术上的易用性而脱颖而出。

8. Microsoft VALL-E-X 8. 微软 VALL-E-X

VALL-E-X is a Python project that implements Microsoft’s VALL-E X zero-shot TTS model, which can generate speech in any language without any training data.
VALL-E-X 是一个 Python 项目,实现了微软的 VALL-E X 零样本 TTS 模型,能够在没有任何训练数据的情况下生成任何语言的语音。

Key Features 主要特点

  • Zero-shot TTS: Can generate speech in any language without prior training data.
    零样本 TTS:无需预先训练数据,可以生成任何语言的语音。
  • In-context Learning: Adapts to new voices and languages swiftly using just a 3-second speech sample.
    上下文学习:只需 3 秒的语音样本,就能迅速适应新声音和语言。
  • High Performance: Surpasses other systems in naturalness and speaker similarity.
  • Emotion and Environment Preservation: Retains the original speaker’s emotion and the recording’s acoustic quality.
  • Multilingual Capabilities: Enables cross-lingual synthesis and speech-to-speech translation while maintaining voice characteristics.

9. Pyttsx3

Pyttsx3 is a versatile text-to-speech library that works seamlessly with both Python 2 and 3. It’s a reliable tool for offline speech generation, supporting various TTS engines and allowing users to create speech without the need for an internet connection.
Pyttsx3 是一个功能多样的文本转语音库,可以无缝配合 Python 2 和 3 使用。它是一个可靠的离线语音生成工具,支持多种 TTS 引擎,让用户在没有互联网连接的情况下也能创建语音。

Key Features 主要特点

  • Offline Capability: Functions without the need for an internet connection.
  • Supports Multiple TTS Engines: Compatible with Sapi5, nsss, and espeak.
    支持多种 TTS 引擎:兼容 Sapi5、nsss 和 espeak。
  • Cross-Version Support: Works with older and newer versions of Python.
    跨版本支持:兼容旧版和新版的 Python。

10. Nvidia NeMo

vidia NeMo is a powerful toolkit for those in the field of conversational AI. This Python-based project offers resources for speech recognition, synthesis, natural language processing, and more, making it an essential tool for both researchers and developers.
Vidia NeMo 是一个强大的对话 AI 工具包。这个基于 Python 的项目提供了语音识别、合成、自然语言处理等资源,对研究人员和开发者来说都是必不可少的工具。

Key Features 主要特点

  • Conversational AI Focus: Designed specifically for speech and language models.
    对话式 AI 重点:专为语音和语言模型设计。
  • Comprehensive Toolkit: Includes support for ASR, TTS, LLMs, and NLP.
    综合工具包:包括对 ASR、TTS、LLMs和 NLP 的支持。
  • Research-Friendly: Facilitates the reuse and development of conversational AI models.
    研究友好:促进对话 AI 模型的再利用和发展。
  • Pretrained Models: Offers a range of pretrained models to accelerate development.

11. DiffSinger

DiffSinger is a pioneering Python project that implements a neural model dedicated to creating synthetic singing voices. It’s designed to generate a singing voice that can be customized and controlled, offering new possibilities for digital music production.
DiffSinger 是一个开创性的 Python 项目,它实现了一个专门用于生成合成歌唱声音的神经模型。该设计旨在产生可定制和控制的歌唱声音,为数字音乐制作提供了新的可能性。

Key Features 主要特点

  • Neural Singing Voice Synthesis: Specializes in generating digital singing voices.
  • Model Control: Allows for fine-tuning and personalization of the synthetic voice.