人工智能代理 2025 — 杰克·范利特

This blog post is based on the AI Agents section of Humans of the Data Sphere Issue #6 with an extra interview with Sean Falconer at the end.
本博客文章基于《数据领域的人类》第 6 期中的“人工智能代理”部分，并在最后额外采访了肖恩·法克纳。

Two interesting blog posts about AI agents have caught my attention over the last few weeks.
最近几周，我注意到了两篇关于人工智能代理的有趣博客文章。

Anthropic wrote Building Effective Agents.
Anthropic 编写了《构建有效智能体》。
Chip Huyen wrote Agents. 奇普·黄写了《代理人》。

Ethan Mollick has also written some excellent blog posts recently:
伊森·莫利克最近也撰写了一些优秀的博客文章：

In this post, I’ll explore what some of the leading experts in this area are saying about AI agents and the challenges ahead.
在这篇文章中，我将探讨一些该领域顶尖专家对人工智能代理及其面临的挑战的看法。

First, what is an AI agent?
首先，什么是人工智能代理？

At the most abstract level, Chip Huyen defines an agent in her Agents blog post:
在最高层次上，Chip Huyen 在她的《代理》博客文章中定义了一个代理：

An agent is anything that can perceive its environment and act upon that environment. Artificial Intelligence: A Modern Approach (1995) defines an agent as anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators. This means that an agent is characterized by the environment it operates in and the set of actions it can perform.
一个智能体是指任何能够感知其环境并对该环境产生作用的实体。《人工智能：一种现代方法》（1995 年）将智能体定义为任何可以通过传感器感知其环境并通过执行器对该环境产生作用的实体。这意味着智能体的特征在于其操作的环境以及它可以执行的动作集合。

Taking actions is a defining characteristic of an AI agent compared to an LLM that only provides textual or graphical responses to prompts. Ethan Mollick notes that much of the work in modernity is digital and something that an AI could plausibly do.
采取行动是人工智能代理的一个显著特征，与仅对提示提供文本或图形响应的LLM相比。Ethan Mollick 指出，在现代社会中，许多工作都是数字化的，这是人工智能可能能够完成的。

The digital world in which most knowledge work is done involves using a computer—navigating websites, filling forms, and completing transactions. Modern AI systems can now perform these same tasks, effectively automating what was previously human-only work. This capability extends beyond simple automation to include qualitative assessment and problem identification.
数字世界，大多数知识工作都在其中完成，涉及使用电脑——浏览网站、填写表格和完成交易。现代人工智能系统现在可以执行这些相同的任务，有效地自动化之前只有人类才能完成的工作。这种能力不仅超越了简单的自动化，还包括定性评估和问题识别。

Anthropic offer another definition of an agent in their Building Effective Agents blog post:
Anthropic 在他們的《构建有效代理》博客文章中提出了对代理的另一种定义：

"Agent" can be defined in several ways. Some customers define agents as fully autonomous systems that operate independently over extended periods, using various tools to accomplish complex tasks. Others use the term to describe more prescriptive implementations that follow predefined workflows. At Anthropic, we categorize all these variations as agentic systems, but draw an important architectural distinction between workflows and agents:
"代理"可以有多种定义方式。一些客户将代理定义为在长时间内独立运行的完全自主系统，使用各种工具完成复杂任务。其他人则用此术语描述遵循预定义工作流程的更具体实现。在 Anthropic，我们将所有这些变体归类为代理系统，但在工作流程和代理之间划出重要的架构区别：
* Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
工作流程是系统，其中LLMs和工具通过预定义的代码路径进行编排。
* Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.
代理，另一方面，是系统，其中LLMs动态地指导自己的流程和工具使用，保持对完成任务方式的控制。

This seems like an important distinction to make. A workflow is a kind of static flow chart of branches and actions that constrain what the AI can do. It’s prescriptive, more predictable, but less flexible. A true AI agent, on the other hand, determines its control flow, giving it the freedom to plan and execute flexibly, but it comes with additional risk. Anthropic note that you should choose the simplest option possible, but when more complexity is needed, then a workflow or agent may be required:
这似乎是一个重要的区分。工作流程是一种静态的分支和动作流程图，它限制了 AI 能做什么。它是规定性的，更可预测，但灵活性较低。另一方面，真正的 AI 代理确定其控制流程，赋予它灵活规划和执行的自由，但这也带来了额外的风险。Anthropic 指出，你应该选择尽可能简单的选项，但当需要更多复杂性时，可能需要工作流程或代理：

…workflows offer predictability and consistency for well-defined tasks, whereas agents are the better option when flexibility and model-driven decision-making are needed at scale.
……工作流程为定义明确的任务提供可预测性和一致性，而代理在需要大规模灵活性和模型驱动决策时是更好的选择。
…agents can be used for open-ended problems where it’s difficult or impossible to predict the required number of steps, and where you can’t hardcode a fixed path.
……智能体可用于解决开放性问题，在这些问题中，难以或无法预测所需的步骤数量，并且无法硬编码一个固定的路径。

In a practical sense, an AI agent is software like any other service that interacts with the world via APIs. The big difference between an agent and regular code is that in order to satisfy a goal, an agent relies on an LLM to decide on control flow, interpret the results of external calls, and decide on next steps. LLMs by themselves have several limitations, such as not directly being able to take actions, or limits to skillsets such as mathematics. AI agents supplement LLMs with tools to augment their capabilities. Chip Huyen classifies the tools into three categories:
在实用意义上，人工智能代理就像其他任何服务软件一样，通过 API 与世界交互。代理与常规代码之间的主要区别在于，为了满足目标，代理依赖于LLM来决定控制流程、解释外部调用的结果，并决定下一步行动。LLMs本身存在一些限制，例如不能直接采取行动，或者数学等技能集的限制。人工智能代理通过工具来补充LLMs，以增强其能力。Chip Huyen 将这些工具分为三类：

Depending on the agent’s environment, there are many possible tools. Here are three categories of tools that you might want to consider: knowledge augmentation (i.e., context construction), capability extension, and tools that let your agent act upon its environment.
根据代理的环境，有许多可能的工具。以下是你可能想要考虑的三种工具类别：知识增强（即，构建上下文）、能力扩展以及允许你的代理对其环境采取行动的工具。

Chip uses web browsing as the canonical example of knowledge augmentation and math, calendar, timezone converters, and unit converters as canonical capability extensions.
芯片将网页浏览作为知识增强和数学的典型示例，将日历、时区和单位转换器作为典型功能扩展。

To satisfy a goal, an agent must use a combination of:
为了实现一个目标，一个智能体必须使用以下组合：

Effective planning and reasoning.
有效的规划和推理。

The agent makes a plan of steps it needs to perform in order to satisfy the goal.
代理制定一个执行步骤的计划，以满足目标。

Accurate tool selection and execution.
准确工具选择与执行。

The LLM may need to make API calls for information retrieval and/or maks changes or take actions in the real world.
LLM可能需要调用 API 进行信息检索以及/或对现实世界中的 maks 进行更改或采取行动。

Self-reflection and evaluation.
自我反思与评估。

At every step, the agent should reflect on what it has planned and the results it has received to ensure it is still doing the right thing.
在每一步，智能体都应该反思它所计划的和所收到的结果，以确保它仍在做正确的事情。

Reviewing the challenges of AI agents
审阅人工智能代理的挑战

AI agents are stochastic systems that add a new flavor of risk compared to deterministic systems that can more readily be modeled and comprehensively tested. Many agent systems require multiple intermediate steps to satisfy a goal, and errors in each step can compound, producing side effects that may be challenging to anticipate. This is one of the defining characteristics of AI agents that the agent designer must account for. In fact, accounting for all the failure modes of an agent is where the highest learning curve is found as well as most of the developmental cost.
人工智能代理是随机系统，与可以更容易地进行建模和全面测试的确定性系统相比，增加了新的风险因素。许多代理系统需要多个中间步骤来实现目标，每个步骤中的错误都可能累积，产生可能难以预料的副作用。这是人工智能代理的一个定义特征，代理设计者必须考虑这一点。实际上，考虑到代理的所有故障模式是学习曲线最高以及开发成本最大的地方。

As a distributed systems engineer, I’m probably more on the paranoid side of the risk-awareness spectrum. Through that lens, I see all manner of challenges to overcome when building AI agents:
作为一名分布式系统工程师，我可能更偏向于风险意识谱系中的偏执一侧。通过这个视角，我看到了在构建人工智能代理时需要克服的各种挑战：

Effective Planning: 有效的规划：

AI agents must create plans that align with their goals while adapting to dynamic environments and incomplete information. Self-reflection and being able to change course may be necessary.
人工智能代理必须制定与自身目标一致的计划，同时适应动态环境和信息不完整的情况。自我反思和能够改变航向可能是必要的。
Plans may need to be evaluated to ensure that they are feasible, efficient, and contextually appropriate. Typically, an agent will be decomposed into multiple parts where planning, execution, and evaluation are carried out by separate components that collaborate.
计划可能需要评估以确保其可行性、效率和情境适宜性。通常，一个智能体将被分解为多个部分，其中规划、执行和评估由不同的组件协同完成。
There are a number of things that can go wrong in the planning phase:
在规划阶段可能会出现许多问题：

The agent does not revise plans when new information contradicts initial assumptions.
代理在新的信息与初始假设相矛盾时不会修订计划。
The agent gets stuck in loops, revisiting the same steps repeatedly without making progress.
代理陷入循环，反复执行相同的步骤而无法取得进展。
The agent sets inappropriate or harmful goals due to poorly defined prompts or objectives.
由于提示或目标定义不明确，代理设置了不适当或有害的目标。
The agent achieves appropriate goals but creates harmful side effects in doing so.
代理实现了适当的目标，但在此过程中产生了有害的副作用。

Accurate Tool Selection and Usage:
准确工具选择与使用：

Agents need to identify the correct tools (e.g., APIs, models) for a task and invoke them properly.
代理需要识别执行任务所需的正确工具（例如，APIs、模型）并正确调用它们。
Common issues include: 常见问题包括：

Invoking the wrong tool. Or failing to consider multiple tools or approaches, leading to suboptimal performance.
调用错误的工具。或者未能考虑多种工具或方法，导致性能不佳。
Providing incorrect or incomplete inputs can create wrong or suboptimal results.
提供错误或不完整的输入可能导致结果错误或次优。
Hallucinating non-existent tools.
幻觉不存在工具。
Failing to recognize when tool outputs indicate anomalies, errors, or limitations.
未能识别工具输出指示异常、错误或限制的情况。

Reasoning and Decision-Making:
推理与决策：

Agents may struggle to interpret the results of their actions or external tool outputs.
代理可能难以解释其行为或外部工具输出的结果。
Errors in reasoning can lead to invalid conclusions, impacting subsequent actions (the compounding of errors). Some reasoning errors may result from forgetting critical context or information needed to make accurate decisions.
推理错误可能导致无效结论，影响后续行动（错误的累积）。一些推理错误可能源于忘记做出准确决策所需的关键背景或信息。

Failure Modes in Execution:
执行中的故障模式：

Agents can fail to execute actions correctly, leading to unintended consequences. The first challenge is detecting when actions are executed incorrectly and the second challenge is remediating such actions.
代理可能无法正确执行动作，导致出现意外后果。第一个挑战是检测动作执行错误，第二个挑战是纠正这些动作。
Difficulty handling edge cases or unexpected outcomes, or handling rare cases as general cases.
处理边缘情况或意外结果困难，或将罕见情况视为一般情况。
Monitoring and auditing agents may also be challenging. Not only detecting when things go wrong, but also detecting bias and justifying why certain actions were taken.
监控和审计代理也可能具有挑战性。不仅要检测事情出错，还要检测偏见并证明为何采取某些行动。

I could go on but you get the idea. A lot of experimentation and iteration will go into AI agent development and getting that last 20% of completeness and polish could be time-consuming. Chip Huyen covers much of this in her framing of AI agent development. The Anthropic post also steers you on the route of choosing the simplest agent, or no agent at all:
我可以继续说，但你们已经明白了。在人工智能代理的开发过程中，需要大量的实验和迭代，而达到最后 20%的完善和打磨可能需要耗费大量时间。Chip Huyen 在她的关于人工智能代理开发的框架中涵盖了这些内容。Anthropic 的帖子也指导你选择最简单的代理，或者根本不使用代理：

When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all.
在构建带有LLMs的应用程序时，我们建议寻找尽可能简单的解决方案，并在必要时才增加复杂性。这可能意味着根本不构建代理系统。
… 由于您没有提供具体的源文本，我无法进行翻译。请提供需要翻译的学术文本，我将为您进行准确的翻译
Start with simple prompts, optimize them with comprehensive evaluation, and add multi-step agentic systems only when simpler solutions fall short. When implementing agents, we try to follow three core principles:
从简单的提示开始，通过全面的评估进行优化，仅在简单解决方案不足时才添加多步骤的代理系统。在实施代理时，我们尝试遵循三个核心原则：
* Maintain simplicity in your agent's design.
保持您的智能体设计简洁。
* Prioritize transparency by explicitly showing the agent’s planning steps.
优先展示代理的规划步骤，以实现透明度。
* Carefully craft your agent-computer interface (ACI) through thorough tool documentation and testing.
精心设计您的代理-计算机界面（ACI），通过详尽的工具文档和测试。

AI agents and agentic systems are an emerging practice, and I agree that 2025 will be the year of the AI agent, given the promise that AI agents hold and the rapid improvements in model capabilities.
人工智能代理和代理系统是一种新兴实践，我同意 2025 年将是人工智能代理的年份，鉴于人工智能代理所持有的承诺以及模型能力的快速提升。

However, with that said, I do have some serious concerns, and I believe there will be two constraining aspects of AI agents that present a challenge to widespread adoption:
然而，话虽如此，我确实有一些严重担忧，我认为人工智能代理将有两个限制性方面，这将对广泛采用构成挑战：

Reliability. There are so many failure modes and even the mitigations are usually run by LLMs and therefore have their own failure modes. Errors compound, detection and mitigations themselves may not be highly reliable.
可靠性。存在如此多的故障模式，甚至缓解措施通常由LLMs执行，因此也有它们自己的故障模式。错误会累积，检测和缓解本身可能并不高度可靠。
Cost. Agents may require multiple reasoning steps using the more powerful models. All this pushes up the cost. With higher costs come higher demands for the value proposition. Of course with the arrival of DeepSeek v3, maybe 2025 will also be the year of the more efficient LLM.
成本。代理可能需要使用更强大的模型进行多个推理步骤。所有这些都推高了成本。随着成本的提高，对价值主张的需求也更高。当然，随着 DeepSeek v3 的到来，也许 2025 年也将是更高效LLM的年份。

Chip Huyen noted in her post:
请注意，由于您没有提供具体的源文本，我无法进行翻译。请提供需要翻译的学术文本，我将为您进行准确的翻译

Compared to non-agent use cases, agents typically require more powerful models for two reasons:
与无代理用例相比，代理通常需要更强大的模型，原因有两个：
Compound mistakes: an agent often needs to perform multiple steps to accomplish a task, and the overall accuracy decreases as the number of steps increases. If the model’s accuracy is 95% per step, over 10 steps, the accuracy will drop to 60%, and over 100 steps, the accuracy will be only 0.6%.
复合错误：一个代理通常需要执行多个步骤来完成一项任务，随着步骤数量的增加，整体准确性会降低。如果模型的每步准确率为 95%，超过 10 步，准确率将降至 60%，超过 100 步，准确率将仅为 0.6%。
Higher stakes: with access to tools, agents are capable of performing more impactful tasks, but any failure could have more severe consequences.
更高的赌注：有了工具，代理人能够执行更具影响力的任务，但任何失败都可能带来更严重的后果。

As AI becomes more and more capable but with non-trivial associated costs, we may enter an age where cost efficiency is the primary decision maker about when to use AI vs when to use a human or not do the thing at all. If both AI and human workers can execute a digital task at similar levels of competence, then cost efficiency becomes the defining question. François Chollet made this point over the holiday period:
随着人工智能能力的不断增强但相关成本也不容忽视，我们可能进入一个以成本效率为主要决策因素的时代，即何时使用人工智能、何时使用人力或根本不去做这件事。如果人工智能和人力工作者都能以相似的能力水平执行数字任务，那么成本效率就成为了决定性问题。弗朗索瓦·肖莱特在假日期间提出了这一点：

One very important thing to understand about the future: the economics of AI are about to change completely. We'll soon be in a world where you can turn test-time compute into competence -- for the first time in the history of software, marginal cost will become critical. Cost-efficiency will be the overarching measure guiding deployment decisions. How much are you willing to pay to solve X?
关于未来的一个非常重要的事情需要理解：人工智能的经济将彻底改变。我们很快将进入一个可以将测试时间计算转化为能力的世界——在软件历史上首次，边际成本将变得至关重要。成本效益将成为指导部署决策的总体衡量标准。你愿意为解决 X 支付多少？

In this early phase, agents are likely best suited to narrow tasks that do not involve important actions such as bank transfers, costly purchases, and actions that cannot be undone without cost or negative impact. Ethan Mollick noted that:
在这个早期阶段，代理最适合执行不涉及重要操作的任务，例如银行转账、昂贵购买以及无法无损撤销或产生负面影响的操作。Ethan Mollick 指出：

Narrow agents are now a real product, rather than a future possibility. There are already many coding agents, and you can use experimental open-source agents that do scientific and financial research.
窄代理现在已经成为现实产品，而不是未来的可能性。已经有许多编码代理，你可以使用实验性的开源代理进行科学和金融研究。
Narrow agents are specialized for a particular task, which means they are somewhat limited. That raises the question of whether we soon see generalist agents where you can just ask the AI anything and it will use a computer and the internet to do it. Simon Willison thinks not despite what Sam Altman has argued. We will learn more as the year progresses, but if general agentic systems work reliably and safely, that really will change things, as it allows smart AIs to take action in the world.
窄代理针对特定任务进行了专门化，这意味着它们有一定的局限性。这引发了一个问题：我们是否很快会看到全能型代理，你只需向 AI 提出任何问题，它就会利用计算机和互联网来完成。Simon Willison 认为不会，尽管 Sam Altman 提出了相反的观点。随着时间的推移，我们将了解更多信息，但如果通用代理系统能够可靠且安全地运行，这确实会改变事情，因为它允许智能 AI 在世界中采取行动。

Interview with Sean Falconer: AI agent use cases and adoption challenges
采访肖恩·法克纳：人工智能代理的应用案例和采纳挑战

What are the use cases for AI agents at this early stage? It seems both wide open for creativity but also somewhat limited due to the still immature practice of building agents and the current limitations of the models.
在这个早期阶段，AI 代理有哪些应用场景？它似乎既为创造性提供了广阔的空间，但也由于构建代理的实践尚不成熟以及当前模型的限制而有所局限。

I’m not an AI expert, more of an observer of the space, so I asked someone immersed in the AI field, my colleague at Confluent, Sean Falconer (also a host of Software Engineering Daily and Software Huddle).
我不是 AI 专家，更像是空间观察者，所以我向一位沉浸在 AI 领域的同事，Confluent 的同事 Sean Falconer（同时也是《软件工程日报》和《软件聚集》的主播）请教。

Q: What are some good use cases for AI agents at this early stage?
Q: 在这个早期阶段，AI 代理有哪些好的应用案例？

Sean’s Response 肖恩的回答

At this stage, the use cases for agents are both exciting and evolving, but it's still early days. Think of it as the early days of cars, useful, but we’re far from self-driving.
在这个阶段，代理的使用案例既令人兴奋又在不断发展，但仍然处于早期阶段。把它想象成汽车的早期阶段，有用，但我们离自动驾驶还远着呢。

The abstraction frameworks might not be the right abstractions, dev tooling to support the full lifecycle of deployment, testing, and monitoring are not mature. There’s still a lot of putting your finger in the air and seeing which way the wind is blowing type of experimentation and iteration required to get something useful.
抽象框架可能不是正确的抽象，支持部署、测试和监控全生命周期的开发工具还不够成熟。仍然需要进行大量尝试和迭代，就像把手指伸向空中看风向一样，才能得到有用的东西。

Despite this, I’m bullish on agents; the promise is huge, and even achieving a fraction of that promise is compelling.
尽管如此，我对代理人持乐观态度；其前景广阔，即使实现其中的一小部分承诺也颇具吸引力。

I think the most effective use cases right now focus on augmenting human effort rather than replacing it. For example, although not necessarily fully agentic, with coding, companies are seeing AI can boost task completion speed by 55% and improve code quality by 82%. And now the co-pilots of the world are supporting more complex task completion by leveraging agents.
我认为目前最有效的应用场景集中在增强人力而非取代人力。例如，尽管不一定完全具有自主性，但通过编程，公司发现人工智能可以将任务完成速度提高 55%，并将代码质量提高 82%。现在，全球的协同驾驶员正通过利用代理来支持更复杂的任务完成。

Agents tend to shine in complex, multi-step workflows that are repetitive, resource-intensive, or too intricate for traditional automation. Think processes that are frustrating for humans but not disastrous if they go wrong. In industry, we’re already seeing agents in sales and marketing that research prospects, identify decision-makers, and draft personalized outreach, or in drug discovery, where they semi-automate filling out regulatory forms while humans verify responses.
代理通常在复杂、多步骤的工作流程中表现出色，这些流程是重复的、资源密集型的，或者对于传统的自动化来说过于复杂。想想那些对人类来说令人沮丧但出错也不会造成灾难性的过程。在工业界，我们已经开始看到代理在销售和营销中研究潜在客户、识别决策者以及制定个性化的接触方案，或者在药物发现中，它们在人类验证回答的同时半自动化填写监管表格。

There’s a ton of potential, but like self-driving cars, perfecting the last 20% to reach a fully automated state will take years. For now, agents excel at reducing grunt work and enhancing human productivity. The real magic happens when smart people work with smart AI.
存在巨大的潜力，但就像自动驾驶汽车一样，要达到完全自动化的状态，最后 20%的完善需要数年。目前，代理在减少体力劳动和提升人类生产力方面表现出色。真正的魔法发生在聪明人与智能 AI 合作的时候。

Q: How should companies evaluate the need for an agent vs a workflow (as per the Anthropic post) or just simple prompts?
问：公司应如何评估使用代理与工作流（根据 Anthropic 的帖子）或仅简单提示的需求？

Sean’s Response 肖恩的回答

When it comes to deciding between a simple prompt, a workflow, or a full-blown agent, the key is to ask yourself, "Do I really need a bazooka to swat this fly?" As Knuth said, “premature optimization is the root of all evil,” and honestly, over-engineering is how you end up with solutions that are impressive but wildly unnecessary.
当涉及到在简单提示、工作流程或完整代理之间做出选择时，关键是要问自己：“我真的需要一把火箭筒来打死这只苍蝇吗？”正如 Knuth 所说，“过早优化是万恶之源”，而坦白地说，过度设计就是导致你得到既令人印象深刻但又极端不必要的解决方案的原因。

The best place to start is with the business value, ask yourself, what are you actually trying to accomplish, and how will you measure if it’s working? Like most engineering, start with the simplest thing that gets the job done.
从商业价值开始是最合适的，问问自己，你实际上想要实现什么，你将如何衡量其是否有效？就像大多数工程一样，从完成工作最简单的事情开始。

For example, I built a tool to help me draft LinkedIn posts about content I’ve worked on. With some clever prompt engineering, I can get a pretty decent draft. Sure, it’s not perfect, sometimes it sounds like an overly enthusiastic social media manager, but it’s a decent enough start that I can refine and it took very little effort, and doesn't cost a lot of tokens.
例如，我开发了一个工具来帮助我撰写关于我所工作的内容的 LinkedIn 帖子。通过一些巧妙的提示工程，我可以得到一个相当不错的草稿。当然，它并不完美，有时听起来像是一个过于热情的社交媒体经理，但这是一个足够好的起点，我可以对其进行完善，而且花费的精力非常少，而且不消耗很多代币。

Now, could I make it fancier? Sure. I could build a complex workflow to pull in context from all my past posts, past articles, and personal anecdotes. I could go further and use an agentic pattern like reflection that iterates and refines the post to perfection.
现在，我能让它更复杂吗？当然可以。我可以构建一个复杂的流程，从我的所有过往帖子、过往文章和个人轶事中提取上下文。我还可以更进一步，使用像反思这样的代理模式，迭代并完善帖子以达到完美。

But, is it worth the effort? Probably not. I’m just trying to get a post out there, not win a Pulitzer.
但，这值得付出努力吗？可能不值得。我只是想发布一篇帖子，而不是赢得普利策奖。

Zooming out beyond the question of what’s my GenAI need, I think it’s important to ask yourself, is GenAI even the right answer for what you want to achieve. Predictive ML and other forms of automation have served humans for decades. Over the last two years, we’ve lost sight of that. Sometimes, a simpler, faster, and cheaper approach does the trick just fine. It’s like what I see all the time in data engineering, people love their fancy pipelines, but let’s not forget, sometimes a simple script and a spreadsheet is all you really need.
将问题范围扩大到我的 GenAI 需求之外，我认为重要的是要问自己，GenAI 是否真的是你想要达到目标正确的答案。预测性机器学习和其他形式的自动化已经为人类服务了几十年。在过去的两年里，我们忽略了这一点。有时候，一个更简单、更快、更便宜的方法就能解决问题。这就像我在数据工程中经常看到的那样，人们喜欢他们复杂的管道，但别忘了，有时候一个简单的脚本和电子表格就足够了。

Q: What are the main adoption challenges of AI agents in 2025?
Q: 2025 年 AI 代理的主要采用挑战有哪些？

Sean’s Response 肖恩的回答

I see the main adoption challenges for agents in 2025 as a mix of over-ambition, engineering headaches, and good old-fashioned data dysfunction.
我视 2025 年代理的主要采用挑战为过度野心、工程难题和传统的数据功能障碍的混合体。

First, there’s the temptation to do too much. Companies dive in headfirst, trying to build agents that can plan, reason, and bring them coffee, only to end up with bloated, overcomplicated systems that deliver meh results. It’s like trying to build a rocket to deliver pizzas, cool idea, but probably not worth the cost. It’s perhaps not as sexy, but starting small and focusing on specific, measurable goals is the way to go.
首先，有做太多的事情的诱惑。公司一头扎进去，试图构建能够规划、推理并为他们提供咖啡的智能代理，结果却得到臃肿、过度复杂的系统，只能提供平庸的结果。这就像试图建造一艘火箭来送披萨，虽然是个酷主意，但可能不值得花费这样的成本。或许不那么性感，但从小处着手，专注于具体、可衡量的目标是正确的做法。

Then there’s the engineering challenge.
那么，还有工程挑战。

Programming and evaluating non-deterministic workflows requires a real shift in mindset for engineering teams. It’s not like traditional coding, where you write instructions and the machine does exactly what you say. With agents, you need patience, flexibility, and a willingness to deal with unexpected outcomes. You have to be ready to program around these limitations or only focus on non-customer facing workflow automation where some level of variance in quality and accuracy is acceptable.
编程和评估非确定性工作流程需要工程团队进行真正的思维转变。这不同于传统的编码，你写下指令，机器就按你说的去做。使用代理时，你需要耐心、灵活，并愿意处理意外结果。你必须准备好在这些限制周围编程，或者只关注面向非客户的流程自动化，在这种情况下，质量和准确性的某些程度的变化是可以接受的。

And then there’s of course data, the Achilles’ heel of most AI projects.
当然还有数据，这是大多数人工智能项目的阿基里斯之踵。

Many companies still don’t have a clear picture of what data they have, who has access, or how to get value from it. If you’re struggling with data modeling and analytics, operationalizing data for AI is going to be a real uphill battle. Data is often locked away in silos, ridiculously expensive to move, and engineers spend their days wrestling with pipelines instead of solving actual problems.
许多公司仍然对拥有的数据、谁可以访问以及如何从中获取价值没有清晰的了解。如果你在数据建模和分析上遇到困难，将数据应用于人工智能将是一场真正的艰难斗争。数据常常被锁在孤岛中，移动成本极其高昂，工程师们整天在与管道搏斗，而不是解决实际问题。

GenAI and agents have enormous potential, but they also expose the cracks in your company’s foundation. If your organization doesn’t tackle these underlying issues, starting with data and engineering workflows, you’ll end up with flashy demos that don’t translate into real value.
人工智能和代理具有巨大潜力，但也暴露了贵公司基础的裂缝。如果贵组织不解决这些根本问题，从数据和工程工作流程开始，最终只会得到一些华而不实的演示，无法转化为实际价值。

[end of interview with Sean]
[采访肖恩结束]

Wrap up 总结

It will be fascinating to watch how agentic systems evolve as a category. I myself will be closely watching the space. As AI agents continue to evolve, the real test will be how they perform in production environments and whether they can deliver consistent business value. I have some concerns over the operating costs of agents but the continued frenetic pace of model development, including the surprisingly low-cost DeepSeek v3 model, means that it’s hard to predict what the cost profile will look like even 6 months from now. The coming year will bring valuable lessons from teams experimenting with different approaches, refining reliability, and balancing running costs against tangible returns. Observing these case studies will help separate hype from reality, revealing what works, what doesn’t, and where the biggest challenges lie.
将代理系统作为一类的发展过程将非常引人入胜。我自己将密切关注这一领域。随着人工智能代理的持续进化，真正的考验将是它们在生产环境中的表现以及它们能否提供持续的商业价值。我对代理的运营成本有一些担忧，但模型开发持续的高强度，包括令人惊讶的低成本 DeepSeek v3 模型，意味着甚至 6 个月后的成本状况都难以预测。明年将为尝试不同方法的团队带来宝贵的经验教训，这些团队在提高可靠性、平衡运行成本与实际回报方面进行了优化。观察这些案例研究将有助于区分炒作与现实，揭示哪些有效，哪些无效，以及最大的挑战在哪里。

First, what is an AI agent?首先，什么是人工智能代理？

Reviewing the challenges of AI agents审阅人工智能代理的挑战

Interview with Sean Falconer: AI agent use cases and adoption challenges采访肖恩·法克纳：人工智能代理的应用案例和采纳挑战

Wrap up 总结

First, what is an AI agent?
首先，什么是人工智能代理？

Reviewing the challenges of AI agents
审阅人工智能代理的挑战

Interview with Sean Falconer: AI agent use cases and adoption challenges
采访肖恩·法克纳：人工智能代理的应用案例和采纳挑战