这是用户在 2025-2-4 19:53 为 https://a16z.com/ai-voice-agents-2025-update/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

View this report on Gamma.

Voice is one of the most powerful unlocks for AI application companies. It is the most frequent (and most information-dense) form of human communication, made “programmable” for the first time due to AI.

For enterprises, AI directly replaces human labor with technology. It’s cheaper, faster, more reliable — and often outperforms humans. Voice agents also allow businesses to be available to their customers 24/7 to answer questions, schedule appointments, or complete purchases. Customer availability and business availability no longer have to match 1:1 (ever tried to call an East Coast bank after 3 p.m. PT?); with voice agents, every business can always be online.
对于企业来说,人工智能直接用技术取代人力劳动。这更便宜、更快速、更可靠——而且往往超越人类。语音助手还使企业能够全天候为客户提供服务,回答问题、安排预约或完成购买。客户的可用性和企业的可用性不再需要一对一匹配(你试过在太平洋时间下午 3 点后拨打东海岸的银行电话吗?);有了语音助手,每个企业都可以始终在线。

For consumers, we believe voice will be the first — and perhaps the primary — way people interact with AI. This interaction could take the form of an always-available companion or coach, or by democratizing services, such as language learning, that were previously inaccessible.

We are just now transitioning from the infrastructure to application layer of AI voice. As models improve, voice will become the wedge, not the product. We are excited about startups using a voice wedge to unlock a broader platform.

What’s new in AI voice?
AI 语音有什么新动态?

2024 was a massive year for AI voice. Since we published our last AI voice update
2024 年是人工智能语音的一个重要年份。自从我们发布了上一次的人工智能语音更新以来……

Advancements in model development have streamlined the infrastructure “stack,” resulting in voice agents with lower latency and improved performance. This improvement has largely materialized in the last six months with new conversational models.

These conversational models are also becoming more affordable over time. In December 2024, OpenAI dropped the price of the GPT-4o realtime API by 60% for input (to $40/1M tokens) and 87.5% for output (to $2.50/1M tokens). GPT-4o mini is also now available via realtime.
这些对话模型的价格也在逐渐变得更加实惠。2024 年 12 月,OpenAI 将 GPT-4o 实时 API 的输入价格降低了 60%(至每百万个标记 40 美元),输出价格降低了 87.5%(至每百万个标记 2.50 美元)。GPT-4o mini 现在也可以通过实时方式使用。

Where are AI agents now?

The voice agent market exploded in H2 2024. One data point: companies building with voice represented 22% of the most recent YC class, per Cartesia.
语音代理市场在 2024 年下半年爆炸性增长。一个数据点:根据 Cartesia,使用语音构建的公司占最近一届 YC 班级的 22%。

Voice agents are also being added as a capability to more horizontal or multi-modal products.

In 2024, we saw companies at several layers of the conversational voice stack attract both funding and traction, including:
在 2024 年,我们看到多个层次的对话语音技术公司吸引了资金和关注,包括:

  • Model companies like ElevenLabs and Cartesia
    像 ElevenLabs 和 Cartesia 这样的模范公司
  • Horizontal platforms like Vapi and Bland
    水平平台如 Vapi 和 Bland
  • Verticalized platforms like HappyRobot and Wayfaster
    垂直化平台如 HappyRobot 和 Wayfaster

Especially for larger enterprises, we’ve rarely seen a shift from full human call-taking → full AI call-taking immediately. Founders instead find a “wedge” to start to capture what is often a small percentage of calls for a customer — which can (hopefully) expand over time into handling more call types and workflows. Wedges we’ve seen include:

Market evolution: Fundraises

Verticals of focus: core markets

The most natural early categories for voice agents typically have high existing call center/BPO spend. If calls are taken by onshore employees as part of their standard jobs: (1) the pain point/revenue is typically not strong enough — unless a significant number of employees solely take/make calls; and (2) it’s difficult to quantify results/savings and “make the case.”
语音助手最自然的早期类别通常具有较高的现有呼叫中心/BPO 支出。如果电话由本地员工作为其标准工作的一部分接听:(1)痛点/收入通常不够强烈——除非有大量员工专门接听/拨打电话;(2)量化结果/节省和“提出理由”是困难的。

Each of these primary verticals (financial services, B2C, B2B, government, and healthcare) are likely to have their own core providers, similar to how they have their own systems of record.

We expect to see significant founder activity in the following categories (reach out if you’re building here!):

  • Financial services – debt collection, for example
    金融服务 - 例如,债务催收
  • Insurance – both customer-facing and back office
    保险 - 包括面向客户和后台办公
  • Government  政府
  • Support services – including more complex customer service calls (like IT help) that require expertise
    支持服务 – 包括需要专业知识的更复杂的客户服务电话(如 IT 帮助)

Outside “call center categories,” we have seen willingness to pay for AI voice agents for coaching or training use cases, largely targeted at high salary jobs. In these industries, realistic voice agents can essentially act as “simulators” to significantly improve on-the-job performance. This can replace labor spend (such as sales coaches) or less effective software.
在“呼叫中心类别”之外,我们看到对用于辅导或培训案例的 AI 语音助手的支付意愿,主要针对高薪职位。在这些行业中,逼真的语音助手可以基本上充当“模拟器”,显著提高在职表现。这可以替代劳动力支出(例如销售教练)或效果较差的软件。

As one indicator of where early stage founders are building, we look at YC companies.
作为早期创始人创业地点的一个指标,我们关注 YC 公司。

Since 2020, there have been 90 voice agent companies. This is accelerating with each new cohort — 10 of these are in the W25 class, which has yet to be fully announced. In pre-2023 cohorts, voice agents are largely companies that have pivoted into the space in the past year.
自 2020 年以来,已有 90 家语音代理公司。随着每一批新公司的加入,这一趋势正在加速——其中 10 家属于尚未完全公布的 W25 级别。在 2023 年前的公司中,语音代理大多是过去一年转型进入这一领域的公司。

YC founders building voice agents are largely concentrated in B2B- (~69%) and healthcare-focused (~18%) use cases, followed by consumer (~13%).
YC 创始人构建语音助手的主要集中在 B2B(约 69%)和医疗保健(约 18%)的应用场景,其次是消费类(约 13%)。

Within B2B, the most common sub-industries are: fintech (16.9%) and ops — largely customer support (12.4%). Within healthcare, voice agents either target front office (patient-facing) or back office (pharmacy, insurance, etc.-facing), focusing on: general human medicine (11.2%), dental (3.4%), veterinary (2.2%), or physical therapy (1.1%).
在 B2B 领域,最常见的子行业是:金融科技(16.9%)和运营——主要是客户支持(12.4%)。在医疗保健领域,语音代理要么针对前台(面向患者),要么针对后台(面向药房、保险等),重点关注:普通人类医学(11.2%)、牙科(3.4%)、兽医(2.2%)或物理治疗(1.1%)。

a16z voice agent investments
a16z 语音代理投资

These are portfolio companies of a16z. A list of investments made by a16z is available here.
这些是 a16z 的投资组合公司。a16z 的投资列表可在此查看。

Voice agent market maps

What are we looking for in AI voice?

Case studies: AI voice interviewers

Job interviews feel like a non-obvious early use case for voice agents, given the complexity (conducting a full interview with a human) and sensitivity (maintaining a strong candidate experience). However, we’ve seen significant early traction from several startups here — some insights below from customers:

The pain point is especially strong in staffing (43 public co agencies, $650B annual revenue) — higher volume, lower to medium skill roles (likely not a 10x engineer at an early stage startup). AI interviews can easily replace screening calls, or even more of the process. This is because:
痛点在于人力资源(43 家上市公司,年收入 6500 亿美元)尤其明显——高流量、低至中等技能的职位(可能不是早期创业公司的 10 倍工程师)。人工智能面试可以轻松取代筛选电话,甚至更多的流程。这是因为:

  • Candidates are more willing to “jump through hoops,” which might include interviewing with an AI
  • Customers get paid by # of candidates they refer or # hired by end employer — more volume is better, as it allows them to send either more candidates or better candidates

"Something like 90% of the candidates we send now make it to first round [with the employer], 75-80% make it to final round. Our numbers were half that before [AI voice interviewing start-up]." Staffing agency for Fortune 100
“我们现在发送的候选人中大约有 90%能进入第一轮[与雇主]面试,75-80%能进入最后一轮。在[人工智能语音面试初创公司]之前,我们的数字只有一半。” — 财富 100 强的人力资源公司

Many AI interview products are already performing at or above the level of a human recruiter, for a few reasons:

  • Candidates can interview ASAP or at any time
  • Evaluation is consistent, and the customer can re-run on past interviews if the criteria changes
  • No issues with language or accents on either side
  • AI is often better able to assess technical or position-specific answers than a general recruiter

"The interviewee often starts gaining trust with AI in a way that they might not with the human interviewer. A recruiter may not have the experience to understand what interviewee is saying. AI can read from systems and give responses that are smarter and more engaging." $200M annual revenue staffing agency
“受访者往往以一种他们可能不会对人类面试官表现出的方式开始与人工智能建立信任。招聘人员可能没有经验去理解受访者所说的话。人工智能可以从系统中读取信息,并给出更聪明、更吸引人的回应。” — 年收入 2 亿美元的招聘机构

Questions around AI voice for 2025
关于 2025 年人工智能语音的问题

Pricing- What will be the preferred pricing model?
定价 - 什么将是首选的定价模型?
Modality expansion- How quickly should companies expand beyond calls?
模式扩展 - 公司应该多快扩展到电话之外?
End vision- Is it possible to replace the xMS?
最终愿景 - 是否有可能替代 xMS?
Industry vs. technical teams- Whose advantage?
Horizontal vs. vertical approach- What makes sense, where?
Emotionality- Will voice agents enhance customer relationships?

In many cases, AI voice agents can already outperform humans on emotional vectors. They pay better attention, are more empathetic and patient, and have (theoretically) unlimited time to spend. There are categories where this will be particularly valuable, and voice agents can help businesses build deeper relationships with their customers  — but this has been relatively untapped so far. We are excited to see how founders build around this theme in the most relevant verticals.

Get in touch!  联系我们!

If you're building in voice AI, I'd love to hear from you. Email me at omoore@a16z.com, or reach out on X.
如果您正在开发语音人工智能,我很想听听您的意见。请通过电子邮件联系我,地址是 omoore@a16z.com,或者在 X 上与我联系。