
What Are AI Agents? A Short Intro And A Step-by-Step Guide to Build Your Own.
什么是人工智能代理?简短介绍和构建自己的人工智能代理的分步指南。
The next big thing? Gartner believes AI agents are the future. OpenAI, Nvidia and Microsoft are betting on it — as are companies such as Salesforce, which have so far been rather inconspicuous in the field of AI.
下一件大事?Gartner 认为人工智能代理是未来的趋势。OpenAI、英伟达(Nvidia)和微软(Microsoft)都将赌注押在了人工智能上,Salesforce 等公司也是如此,它们迄今为止在人工智能领域还很不起眼。
And there’s no doubt that the thing is really taking off right now.
毫无疑问,这东西现在正在真正起飞。

谷歌趋势上的 "人工智能代理"(trends.google.com)
Wow. 哇
So, what is really behind the trend? The key to understanding agents is agency.
那么,这一趋势背后的真正原因是什么呢?了解代理商的关键在于代理。
Unlike traditional generative AI systems, agents don’t just respond to user input. Instead, they can process a complex problem such as an insurance claim from start to finish. This includes understanding the text, images and PDFs of the claim, retrieving information from the customer database, comparing the case with the insurance terms and conditions, asking the customer questions and waiting for their response — even if it takes days — without losing context.
与传统的生成式人工智能系统不同,代理并不只是响应用户的输入。相反,它们可以自始至终处理保险索赔等复杂问题。这包括理解索赔的文本、图像和 PDF 文件,从客户数据库中检索信息,将案件与保险条款和条件进行比较,向客户提问并等待他们的回复(即使需要数天时间),而不会丢失上下文。
The agents do this autonomously — without humans having to check whether the AI is processing everything correctly.
这些代理可以自主完成这些工作--人类无需检查人工智能是否正确处理了一切。
The Espresso Machine and the Barista
咖啡机和咖啡师
In contrast to existing AI systems and all the copilots out there that help employees to do their job, AI agents are, in fact, fully-fledged employees themselves, offering immense potential for process automation.
与现有的人工智能系统和所有帮助员工完成工作的副驾驶员相比,人工智能代理实际上本身就是完全成熟的员工,为流程自动化提供了巨大的潜力。
Imagine — an AI that can take on complex, multi-step tasks that are currently performed by a human employee or an entire department:
想象一下,人工智能可以承担目前由人类员工或整个部门执行的复杂、多步骤任务:
- Planning, designing, executing, measuring, and optimizing a marketing campaign
规划、设计、执行、衡量和优化营销活动 - Locate a lost shipment in logistics by communicating with carriers, customers, and warehouses — or, if it remains lost, claim its value from the responsible partner.
通过与承运商、客户和仓库沟通,找到物流中丢失的货物,如果货物仍然丢失,则向负责的合作伙伴索赔。 - Search the trademark database each day and determine whether a new trademark has been registered that conflicts with my own trademark and immediately file an opposition
每天搜索商标数据库,确定是否有新注册的商标与自己的商标相冲突,并立即提出异议 - gather the relevant data or ask employees, check the data and compile an ESG report
收集相关数据或询问员工,检查数据并编制 ESG 报告
Currently, AI models can assist with tasks like generating campaign content or evaluating emails, but they lack the ability to execute an entire process. An AI agent can do that.
目前,人工智能模型可以协助完成生成活动内容或评估电子邮件等任务,但它们缺乏执行整个流程的能力。人工智能代理可以做到这一点。

传统的生成式人工智能可以帮助人类团队完成一个流程(黄色),而人工智能代理则可以端对端地执行整个流程(橙色)。图片来源:Maximilian Vogel
While traditional models are like great espresso machines, agent-based AI is the barista. Not only can they make coffee, but they can welcome the guests, take the order, serve the coffee, collect the money, put the cups in the dishwasher, and even close up shop at night. Even the best espresso machine in the world can’t run a café by itself, but the barista can.
传统模式就像一台出色的意式浓缩咖啡机,而基于代理的人工智能就是咖啡师。他们不仅能煮咖啡,还能迎宾、点菜、端咖啡、收钱、把杯子放进洗碗机,甚至晚上关店。即使是世界上最好的意式浓缩咖啡机也无法独自经营一家咖啡馆,但咖啡师可以。
Why can the AI agent and the barista do this? They excel at mastering various subprocesses of a complex job and can independently decide which task to tackle next. They can communicate with people, like the clients, if they need more information (milk or oat milk?). They can decide who they should ask in case of problems (beans are out => boss, coffee machine is on strike => customer service of the machine vendor).
为什么人工智能代理和咖啡师能做到这一点?它们擅长掌握复杂工作的各个子流程,并能独立决定下一步处理哪项任务。如果需要更多信息(牛奶还是燕麦牛奶?)遇到问题时,他们可以决定找谁(咖啡豆用完了 => 老板,咖啡机罢工了 => 咖啡机供应商的客户服务)。

人工智能代理与传统的生成式人工智能。图片来源:Maximilian Vogel
Anatomy of an AI Worker
剖析人工智能工人
But enough chatting, let’s build an AI agent. Let us have a look at the relevant processes and workflows.
闲话少说,让我们来构建一个人工智能代理。让我们来看看相关的流程和工作流。
Let us build an agent for the insurance process shown in the diagram above. The agent should handle an insurance claim from start to reimbursement.
让我们为上图所示的保险流程建立一个代理。该代理应处理保险索赔从开始到报销的整个过程。
What we are developing here is the business architecture and the process flow. Unfortunately, I can’t dive into the coding because it can quickly become very extensive.
我们在这里开发的是业务架构和流程。遗憾的是,我不能深入研究编码问题,因为它很快就会变得非常庞杂。
1. Classification & sending a job into processing lanes
1.将作业分类并送入处理通道
Our workflow starts, when a customer sends a message with a claim for their home insurance to the insurer.
当客户向保险公司发送房屋保险索赔信息时,我们的工作流程就开始了。
What does our agent do? It determines what the customer wants by analyzing the message’s content.
我们的代理是做什么的?它通过分析信息内容来确定客户的需求。
Based on this classification, the system initiates a processing lane. Often, this goes beyond function calling; it involves making a fundamental decision about the process, followed by executing many discrete steps.
根据这一分类,系统会启动一条处理通道。通常情况下,这不仅仅是函数调用,还包括对流程做出基本决定,然后执行许多离散步骤。

人工智能代理:1.将邮件分类并路由到不同的处理通道。图片来源:Maximilian Vogel
2. Extracting data 2.提取数据
In the next step, data is extracted. One of the main tasks of an agent is to turn unstructured data into structured data … to make processing systematic, safe and secure.
下一步是提取数据。代理的主要任务之一是将非结构化数据转化为结构化数据......使处理系统化、安全可靠。
Classification assigns a text to a predefined category, whereas extraction involves reading and interpreting data from the text. However, a language model doesn’t directly copy data from the input prompt; instead, it generates a response. This allows for data formatting, such as converting a phone number from ‘(718) 123–45678’ to ‘+1 718 123 45678’.
分类是将文本归入预定义的类别,而提取则是阅读和解释文本中的数据。但是,语言模型不会直接从输入提示中复制数据,而是生成一个响应。这样就可以对数据进行格式化,例如将电话号码从"(718) 123-45678 "转换为 "+1 718 123 45678"。

人工智能代理:2.从邮件和附件中提取数据。图片来源:Maximilian Vogel
The extraction of data is not limited to text content (from the e-mail text), but can also comprise data from images, PDFs or other documents. We use more than one model for that: LLMs, image recognition models, OCR and others. The above process is simplified, really massively simplified. In reality, we often send images to OCR systems that extract text from scanned invoices or forms.. And often we classify attachments as well, before analyzing them.
数据提取不仅限于文本内容(从电子邮件文本中),还可以包括从图像、PDF 或其他文档中提取的数据。为此,我们使用了多种模型:LLMs、图像识别模型、OCR 等。上述过程是简化过的,真的是大量简化。实际上,我们经常将图像发送到 OCR 系统,从扫描的发票或表格中提取文本。在分析附件之前,我们通常还会对其进行分类。
We enforce JSON as the model’s output format to ensure structured data.
我们将 JSON 作为模型的输出格式,以确保数据的结构化。
This is the email input — unstructured data:
这就是电子邮件输入--非结构化数据:
Hi,
I would like to report a damage and ask you to compensate me.
Yesterday, while playing with a friend, my 9-year-old son Rajad kicked a soccer ball against the chandelier in the living room, which then broke from its holder and fell onto the floor and shattered (it was made of glass).
Luckily no one is injured, but the chandelier is damaged beyond repair.
Attached is an invoice and some images of the destroyed chandelier.
Deepak Jamal
contract no: HC12-223873923
123 Main Street
10008 New York City
(718) 123 45678
This is the model output — a JSON, structured data:
这是模型输出--JSON 结构化数据:
{
"name": "Deepak",
"surname": "Jamal",
"address": "123 Main Street, 10008 New York City, NY",
"phone":"+1 718 123 45678",
"contract_no": "HC12-223873923",
"claim_description": "Yesterday [Dec-8, 2024], while playing with a friend, my 9-year-old son Rajad kicked a soccer ball against the chandelier in the living room, which then broke from its holder and fell onto the floor and shattered (it was made of glass).\nLuckily no one is injured, but the chandelier is damaged beyond repair.\n"
}
3. Calling external services, making the context persistent
3.调用外部服务,使上下文具有持久性
Many generative AI systems can answer queries directly — sometimes using pre-trained data, fine-tuning, or Retrieval Augmented Generation (RAG) on some documents. This is not enough for agents. Almost every reasonably powerful AI agent needs to access corporate or external data from databases.
许多生成式人工智能系统可以直接回答查询--有时使用预训练数据、微调或对某些文档进行检索增强生成(RAG)。这对代理来说是不够的。几乎所有功能强大的人工智能代理都需要从数据库中访问公司或外部数据。
To keep the context of a process persistent beyond the current session, it must also write data to systems and databases. In our case, the agent checks the contract number against a customer database and writes the status of the claim to an issue tracking system. It can also — remember: agency! — request missing data from external parties, such as the customer.
为了使进程的上下文在当前会话之后仍能持续存在,它还必须向系统和数据库写入数据。在我们的案例中,代理会根据客户数据库检查合同号,并将索赔状态写入问题跟踪系统。它还可以--记住:代理!- 它还可以向客户等外部方请求缺失的数据。

人工智能代理:3.调用外部服务并使上下文持久化。图片来源:Maximilian Vogel
4. Assessment, RAG, reasoning and confidence
4.评估、RAG、推理和信心
The heart of every administration job consists of interpreting incoming cases in relation to various rules. AI is particularly good at this. Because we can’t provide all contextual information (e.g., policy content or terms and conditions) when calling a model, we use a vector database to retrieve relevant snippets — a technique known as RAG.
每项行政工作的核心都包括根据各种规则解释收到的案件。人工智能在这方面尤为擅长。由于我们无法在调用模型时提供所有上下文信息(如政策内容或条款和条件),因此我们使用矢量数据库来检索相关片段,这种技术被称为 RAG。
And we prompt the AI to ‘think aloud’ before making an assessment. Thinking before blurting out the result improves answer quality — something we’ve all learned since 3rd grade math. We can also use the output of the model reasoning in many obvious and less obvious ways:
我们会提示人工智能在做出评估前 "大声思考"。在说出结果之前进行思考可以提高答案质量--这是我们从三年级数学课就开始学习的。我们还可以将模型推理的输出结果用于许多显而易见或不那么显而易见的方面:
- To substantiate an answer to the customer
- 为客户的答复提供证据
- To help the prompt engineer and data scientist figure out why the model made a mistake
- 帮助提示工程师和数据科学家找出模型出错的原因
- For checks: Did the model arrive at the correct answer by chance, or can we see through its reasoning that the solution was inevitable?
- 用于检查:模型得出正确答案是偶然的,还是通过推理我们可以看出答案是必然的?
Confidence is the key to maximizing accuracy. If the model estimates its confidence — and, dear prompt engineers, this also requires very good few shot learning examples for various confidence values — then we can configure the system to operate with extreme safety or high automation: We set a threshold of confidence below which all cases should go to human support. A high threshold ensures minimal errors but requires more manual processing, while a lower threshold allows more cases to be processed automatically, albeit with an increased risk of errors.
置信度是最大限度提高准确性的关键。如果模型能估算出自己的置信度--亲爱的提示工程师们,这也需要针对不同置信度值有非常好的几个学习实例--那么我们就可以配置系统,使其在极端安全或高度自动化的情况下运行:我们设定一个置信度阈值,低于这个阈值的所有情况都应由人工支持。阈值越高,错误越少,但需要更多的人工处理;阈值越低,自动处理的案例越多,但出错的风险也越大。

人工智能代理4.使用 RAG/推理/信心来获得可靠的评估。图片来源:Maximilian Vogel
Et voila! If you have just implemented 2 or 3 of the above steps, you have developed an agent. I’ve outlined only the key components of these AI agents. You can certainly imagine the others. And you can either implement it with help of frameworks such as crewAI, langGraph, langFlow and their siblings or just do it in pure Python.
就是这样!如果你刚刚实施了上述步骤中的 2 或 3 个,那么你就开发出了一个代理。我只是概述了这些人工智能代理的关键组成部分。当然,你还可以想象其他部分。你既可以借助 crewAI、langGraph、langFlow 等框架来实现,也可以用纯 Python 来实现。
Remarkably, such a system can automate 70%–90% of a claims management department’s workload. And that’s not possible with simple pre-agent generative AI systems. Two years ago, I could never have imagined this becoming reality so quickly.
值得注意的是,这样的系统可以自动完成理赔管理部门 70%-90% 的工作量。而这是简单的预代理生成式人工智能系统无法实现的。两年前,我还无法想象这一切会如此迅速地成为现实。
tl;dr? Here’s AI agents in a nutshell:
简而言之?下面是人工智能代理的简要介绍:

人工智能代理的三大法则:图片来源:马克西米利安-沃格尔(Maximilian Vogel),明显借鉴了艾萨克-阿西莫夫(Isaac Asimov)的观点
These agents will certainly keep me busy over the coming months — my team and me have just launched a large logistics system.
这些代理商肯定会让我在接下来的几个月里忙得不可开交--我和我的团队刚刚启动了一个大型物流系统。
I wish you every success with your AI and agentic AI systems!
祝你们的人工智能和代理人工智能系统取得圆满成功!
And if you feel like it:
如果你愿意
Follow me on Medium (⇈) or LinkedIn for updates and new stories on generative AI, AI workers, and prompt engineering.
请在 Medium (⇈) 或 LinkedIn 上关注我,获取有关生成式人工智能、人工智能工作者和及时工程的更新和新故事。