这是用户在 2025-2-4 19:53 为 https://a16z.com/ai-voice-agents-2025-update/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

View this report on Gamma.
查看关于伽马的报告。

Voice is one of the most powerful unlocks for AI application companies. It is the most frequent (and most information-dense) form of human communication, made “programmable” for the first time due to AI.
语音是人工智能应用公司最强大的解锁之一。它是人类沟通中最频繁(也是信息密度最高)的形式,因人工智能首次变得“可编程”。

For enterprises, AI directly replaces human labor with technology. It’s cheaper, faster, more reliable — and often outperforms humans. Voice agents also allow businesses to be available to their customers 24/7 to answer questions, schedule appointments, or complete purchases. Customer availability and business availability no longer have to match 1:1 (ever tried to call an East Coast bank after 3 p.m. PT?); with voice agents, every business can always be online.
对于企业来说,人工智能直接用技术取代人力劳动。这更便宜、更快速、更可靠——而且往往超越人类。语音助手还使企业能够全天候为客户提供服务,回答问题、安排预约或完成购买。客户的可用性和企业的可用性不再需要一对一匹配(你试过在太平洋时间下午 3 点后拨打东海岸的银行电话吗?);有了语音助手,每个企业都可以始终在线。

For consumers, we believe voice will be the first — and perhaps the primary — way people interact with AI. This interaction could take the form of an always-available companion or coach, or by democratizing services, such as language learning, that were previously inaccessible.
对于消费者来说,我们相信语音将是人们与人工智能互动的首要方式,甚至可能是主要方式。这种互动可以表现为一个随时可用的伴侣或教练,或者通过使以前无法获得的服务(如语言学习)变得普及。

We are just now transitioning from the infrastructure to application layer of AI voice. As models improve, voice will become the wedge, not the product. We are excited about startups using a voice wedge to unlock a broader platform.
我们现在正从人工智能语音的基础设施层过渡到应用层。随着模型的改进,语音将成为切入点,而不是产品。我们对那些利用语音切入点来解锁更广泛平台的初创公司感到兴奋。

What’s new in AI voice?
AI 语音有什么新动态?

2024 was a massive year for AI voice. Since we published our last AI voice update
2024 年是人工智能语音的一个重要年份。自从我们发布了上一次的人工智能语音更新以来……

Advancements in model development have streamlined the infrastructure “stack,” resulting in voice agents with lower latency and improved performance. This improvement has largely materialized in the last six months with new conversational models.
模型开发的进展简化了基础设施“堆栈”,使语音助手的延迟降低,性能提升。这一改善在过去六个月中主要体现在新的对话模型上。

These conversational models are also becoming more affordable over time. In December 2024, OpenAI dropped the price of the GPT-4o realtime API by 60% for input (to $40/1M tokens) and 87.5% for output (to $2.50/1M tokens). GPT-4o mini is also now available via realtime.
这些对话模型的价格也在逐渐变得更加实惠。2024 年 12 月,OpenAI 将 GPT-4o 实时 API 的输入价格降低了 60%(至每百万个标记 40 美元),输出价格降低了 87.5%(至每百万个标记 2.50 美元)。GPT-4o mini 现在也可以通过实时方式使用。

Where are AI agents now?
人工智能代理现在在哪里?

The voice agent market exploded in H2 2024. One data point: companies building with voice represented 22% of the most recent YC class, per Cartesia.
语音代理市场在 2024 年下半年爆炸性增长。一个数据点:根据 Cartesia,使用语音构建的公司占最近一届 YC 班级的 22%。

Voice agents are also being added as a capability to more horizontal or multi-modal products.
语音助手也正作为一种功能被添加到更多的横向或多模态产品中。

In 2024, we saw companies at several layers of the conversational voice stack attract both funding and traction, including:
在 2024 年,我们看到多个层次的对话语音技术公司吸引了资金和关注,包括:

  • Model companies like ElevenLabs and Cartesia
    像 ElevenLabs 和 Cartesia 这样的模范公司
  • Horizontal platforms like Vapi and Bland
    水平平台如 Vapi 和 Bland
  • Verticalized platforms like HappyRobot and Wayfaster
    垂直化平台如 HappyRobot 和 Wayfaster

Especially for larger enterprises, we’ve rarely seen a shift from full human call-taking → full AI call-taking immediately. Founders instead find a “wedge” to start to capture what is often a small percentage of calls for a customer — which can (hopefully) expand over time into handling more call types and workflows. Wedges we’ve seen include:
尤其对于大型企业,我们很少看到从完全人工接听电话到完全人工智能接听电话的直接转变。创始人通常会找到一个“切入点”,开始捕捉客户通常只有一小部分的电话——这可以(希望)随着时间的推移扩展到处理更多的电话类型和工作流程。我们看到的切入点包括:

Market evolution: Fundraises
市场演变:融资情况

Verticals of focus: core markets
重点领域:核心市场

The most natural early categories for voice agents typically have high existing call center/BPO spend. If calls are taken by onshore employees as part of their standard jobs: (1) the pain point/revenue is typically not strong enough — unless a significant number of employees solely take/make calls; and (2) it’s difficult to quantify results/savings and “make the case.”
语音助手最自然的早期类别通常具有较高的现有呼叫中心/BPO 支出。如果电话由本地员工作为其标准工作的一部分接听:(1)痛点/收入通常不够强烈——除非有大量员工专门接听/拨打电话;(2)量化结果/节省和“提出理由”是困难的。

Each of these primary verticals (financial services, B2C, B2B, government, and healthcare) are likely to have their own core providers, similar to how they have their own systems of record.
这些主要领域(金融服务、B2C、B2B、政府和医疗保健)可能会有各自的核心供应商,类似于它们各自的记录系统。

We expect to see significant founder activity in the following categories (reach out if you’re building here!):
我们预计在以下类别中会看到显著的创始人活动(如果您在这里创业,请与我们联系!):

  • Financial services – debt collection, for example
    金融服务 - 例如,债务催收
  • Insurance – both customer-facing and back office
    保险 - 包括面向客户和后台办公
  • Government  政府
  • Support services – including more complex customer service calls (like IT help) that require expertise
    支持服务 – 包括需要专业知识的更复杂的客户服务电话(如 IT 帮助)

Outside “call center categories,” we have seen willingness to pay for AI voice agents for coaching or training use cases, largely targeted at high salary jobs. In these industries, realistic voice agents can essentially act as “simulators” to significantly improve on-the-job performance. This can replace labor spend (such as sales coaches) or less effective software.
在“呼叫中心类别”之外,我们看到对用于辅导或培训案例的 AI 语音助手的支付意愿,主要针对高薪职位。在这些行业中,逼真的语音助手可以基本上充当“模拟器”,显著提高在职表现。这可以替代劳动力支出(例如销售教练)或效果较差的软件。

As one indicator of where early stage founders are building, we look at YC companies.
作为早期创始人创业地点的一个指标,我们关注 YC 公司。

Since 2020, there have been 90 voice agent companies. This is accelerating with each new cohort — 10 of these are in the W25 class, which has yet to be fully announced. In pre-2023 cohorts, voice agents are largely companies that have pivoted into the space in the past year.
自 2020 年以来,已有 90 家语音代理公司。随着每一批新公司的加入,这一趋势正在加速——其中 10 家属于尚未完全公布的 W25 级别。在 2023 年前的公司中,语音代理大多是过去一年转型进入这一领域的公司。

YC founders building voice agents are largely concentrated in B2B- (~69%) and healthcare-focused (~18%) use cases, followed by consumer (~13%).
YC 创始人构建语音助手的主要集中在 B2B(约 69%)和医疗保健(约 18%)的应用场景,其次是消费类(约 13%)。

Within B2B, the most common sub-industries are: fintech (16.9%) and ops — largely customer support (12.4%). Within healthcare, voice agents either target front office (patient-facing) or back office (pharmacy, insurance, etc.-facing), focusing on: general human medicine (11.2%), dental (3.4%), veterinary (2.2%), or physical therapy (1.1%).
在 B2B 领域,最常见的子行业是:金融科技(16.9%)和运营——主要是客户支持(12.4%)。在医疗保健领域,语音代理要么针对前台(面向患者),要么针对后台(面向药房、保险等),重点关注:普通人类医学(11.2%)、牙科(3.4%)、兽医(2.2%)或物理治疗(1.1%)。

a16z voice agent investments
a16z 语音代理投资

These are portfolio companies of a16z. A list of investments made by a16z is available here.
这些是 a16z 的投资组合公司。a16z 的投资列表可在此查看。

Voice agent market maps
语音助手市场地图

What are we looking for in AI voice?
我们在人工智能语音中寻找什么?

Case studies: AI voice interviewers
案例研究:人工智能语音面试官

Job interviews feel like a non-obvious early use case for voice agents, given the complexity (conducting a full interview with a human) and sensitivity (maintaining a strong candidate experience). However, we’ve seen significant early traction from several startups here — some insights below from customers:
工作面试似乎是语音助手一个不明显的早期应用案例,因为其复杂性(与人类进行完整面试)和敏感性(保持良好的候选人体验)。然而,我们已经看到一些初创公司在这方面取得了显著的早期进展——以下是来自客户的一些见解:

The pain point is especially strong in staffing (43 public co agencies, $650B annual revenue) — higher volume, lower to medium skill roles (likely not a 10x engineer at an early stage startup). AI interviews can easily replace screening calls, or even more of the process. This is because:
痛点在于人力资源(43 家上市公司,年收入 6500 亿美元)尤其明显——高流量、低至中等技能的职位(可能不是早期创业公司的 10 倍工程师)。人工智能面试可以轻松取代筛选电话,甚至更多的流程。这是因为:

  • Candidates are more willing to “jump through hoops,” which might include interviewing with an AI
    候选人更愿意“跳过障碍”,这可能包括与人工智能进行面试
  • Customers get paid by # of candidates they refer or # hired by end employer — more volume is better, as it allows them to send either more candidates or better candidates
    客户根据他们推荐的候选人数或最终雇主雇佣的人数获得报酬——数量越多越好,因为这使他们能够发送更多候选人或更优秀的候选人

"Something like 90% of the candidates we send now make it to first round [with the employer], 75-80% make it to final round. Our numbers were half that before [AI voice interviewing start-up]." Staffing agency for Fortune 100
“我们现在发送的候选人中大约有 90%能进入第一轮[与雇主]面试,75-80%能进入最后一轮。在[人工智能语音面试初创公司]之前,我们的数字只有一半。” — 财富 100 强的人力资源公司

Many AI interview products are already performing at or above the level of a human recruiter, for a few reasons:
许多人工智能面试产品的表现已经达到或超过人类招聘者的水平,原因有几个:

  • Candidates can interview ASAP or at any time
    候选人可以尽快或在任何时间进行面试
  • Evaluation is consistent, and the customer can re-run on past interviews if the criteria changes
    评估是一致的,如果标准发生变化,客户可以重新进行过去的面试
  • No issues with language or accents on either side
    双方在语言或口音上没有问题
  • AI is often better able to assess technical or position-specific answers than a general recruiter
    人工智能通常比一般招聘人员更能评估技术或职位特定的答案

"The interviewee often starts gaining trust with AI in a way that they might not with the human interviewer. A recruiter may not have the experience to understand what interviewee is saying. AI can read from systems and give responses that are smarter and more engaging." $200M annual revenue staffing agency
“受访者往往以一种他们可能不会对人类面试官表现出的方式开始与人工智能建立信任。招聘人员可能没有经验去理解受访者所说的话。人工智能可以从系统中读取信息,并给出更聪明、更吸引人的回应。” — 年收入 2 亿美元的招聘机构

Questions around AI voice for 2025
关于 2025 年人工智能语音的问题

Pricing- What will be the preferred pricing model?
定价 - 什么将是首选的定价模型?
Modality expansion- How quickly should companies expand beyond calls?
模式扩展 - 公司应该多快扩展到电话之外?
End vision- Is it possible to replace the xMS?
最终愿景 - 是否有可能替代 xMS?
Industry vs. technical teams- Whose advantage?
行业与技术团队——谁更具优势?
Horizontal vs. vertical approach- What makes sense, where?
横向与纵向方法——在何处更有意义?
Emotionality- Will voice agents enhance customer relationships?
情感性——语音助手会增强客户关系吗?

In many cases, AI voice agents can already outperform humans on emotional vectors. They pay better attention, are more empathetic and patient, and have (theoretically) unlimited time to spend. There are categories where this will be particularly valuable, and voice agents can help businesses build deeper relationships with their customers  — but this has been relatively untapped so far. We are excited to see how founders build around this theme in the most relevant verticals.
在许多情况下,人工智能语音助手在情感维度上已经能够超越人类。它们更能专注,更具同理心和耐心,并且(理论上)拥有无限的时间可供使用。有些领域对此尤其有价值,语音助手可以帮助企业与客户建立更深层次的关系——但到目前为止,这一领域尚未得到充分开发。我们期待看到创始人在最相关的垂直领域围绕这一主题进行构建。

Get in touch!  联系我们!

If you're building in voice AI, I'd love to hear from you. Email me at omoore@a16z.com, or reach out on X.
如果您正在开发语音人工智能,我很想听听您的意见。请通过电子邮件联系我,地址是 omoore@a16z.com,或者在 X 上与我联系。