12-Factor Agents - Principles for building reliable LLM applications
12-Factor Agents - 打造可靠 LLM 應用程式的準則
In the spirit of 12 Factor Apps. The source for this project is public at https://github.com/humanlayer/12-factor-agents, and I welcome your feedback and contributions. Let's figure this out together!
這個專案是從 12 Factor Apps 的概念啟發的。原始碼都放在 https://github.com/humanlayer/12-factor-agents 上了,很歡迎大家給我意見或是貢獻程式碼。我們一起把它弄好吧!
Hi, I'm Dex. I've been hacking on AI agents for a while.
嗨,我是 Dex。玩 AI agent 的時間也一段日子了。
I've tried every agent framework out there, from the plug-and-play crew/langchains to the "minimalist" smolagents of the world to the "production grade" langraph, griptape, etc.
市面上各種 agent 框架我大概都試過了,從那種隨插即用型的,像 crew、langchains,到號稱「極簡」的 smolagents,再到那些「生產級別」的 langraph、griptape 等等,都玩過一輪。
I've talked to a lot of really strong founders, in and out of YC, who are all building really impressive things with AI. Most of them are rolling the stack themselves. I don't see a lot of frameworks in production customer-facing agents.
我跟不少超厲害的創業者聊過,YC 裡裡外外都有,他們用 AI 搞出來的東西都超讚的。但說真的,大部分都是自己從頭寫一套的。很少看到在實際產品、面向客戶的 agent 裡,真的有大量用某個框架的。
I've been surprised to find that most of the products out there billing themselves as "AI Agents" are not all that agentic. A lot of them are mostly deterministic code, with LLM steps sprinkled in at just the right points to make the experience truly magical.
其實我蠻驚訝的,很多號稱是「AI agent」的產品,骨子裡沒那麼「agent」。大部分還是蠻死板的程式碼,只是在關鍵的地方加點 LLM 的步驟,讓整個體驗瞬間變得超屌。
Agents, at least the good ones, don't follow the "here's your prompt, here's a bag of tools, loop until you hit the goal" pattern. Rather, they are comprised of mostly just software.
好的 agent,至少我認為好的 agent,可不是那種「給你一個 prompt、給你一堆工具,然後一直 loop 到目標達成」這麼簡單。它們很大一部分其實就是寫得很好的軟體。
So, I set out to answer:
所以,我就想弄清楚:
Welcome to 12-factor agents. As every Chicago mayor since Daley has consistently plastered all over the city's major airports, we're glad you're here.
歡迎來到「12-factor agents」。就像芝加哥從 Daley 以來的每任市長,在主要機場都一直貼著那句標語一樣,我們也很高興你來了。(這句有點像芝加哥當地的梗啦)
Special thanks to @iantbutler01, @tnm, @hellovai, @stantonk, @balanceiskey, @AdjectiveAllison, @pfbyjy, @a-churchill, and the SF MLOps community for early feedback on this guide.
特別感謝 @iantbutler01, @tnm, @hellovai, @stantonk, @balanceiskey, @AdjectiveAllison, @pfbyjy, @a-churchill 以及 SF MLOps 社群,他們很早就給我這份指南提供了寶貴的建議。
Even if LLMs continue to get exponentially more powerful, there will be core engineering techniques that make LLM-powered software more reliable, more scalable, and easier to maintain.
即使 LLMs 繼續以指數級的速度進步,還是有一些核心的工程技術,能讓 LLM 驅動的軟體更可靠、更容易擴充,而且更好維護。
- How We Got Here: A Brief History of Software
我們是怎麼走到這一步的:軟體的簡史 - Factor 1: Natural Language to Tool Calls
原則一:自然語言轉工具呼叫 (Tool Calls) - Factor 2: Own your prompts
原則二:掌握你的提示 (Prompts) - Factor 3: Own your context window
原則三:掌握你的上下文視窗 (Context Window) - Factor 4: Tools are just structured outputs
原則四:工具 (Tools) 其實就是結構化輸出 - Factor 5: Unify execution state and business state
原則五:統一執行狀態和商業狀態 - Factor 6: Launch/Pause/Resume with simple APIs
原則六:用簡單的 APIs 實現啟動、暫停、恢復 - Factor 7: Contact humans with tool calls
原則七:透過工具呼叫 (Tool Calls) 和人類溝通 - Factor 8: Own your control flow
原則八:掌握你的控制流程 (Control Flow) - Factor 9: Compact Errors into Context Window
因素 9:將錯誤緊湊地塞進 Context Window - Factor 10: Small, Focused Agents
因素 10:小巧、專注的 Agents - Factor 11: Trigger from anywhere, meet users where they are
因素 11:隨處觸發,讓 Agent 在使用者所在之處提供服務 - Factor 12: Make your agent a stateless reducer
因素 12:將你的 Agent 設計成一個無狀態的 Reducer
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
For a deeper dive on my agent journey and what led us here, check out A Brief History of Software - a quick summary here:
想更深入了解我的 Agent 旅程以及我們如何走到這一步,可以看看「A Brief History of Software」這篇文章,這裡有個快速摘要:
We're gonna talk a lot about Directed Graphs (DGs) and their Acyclic friends, DAGs. I'll start by pointing out that...well...software is a directed graph. There's a reason we used to represent programs as flow charts.
我們會大量討論定向圖 (Directed Graphs, DGs) 以及牠們的無環好朋友,也就是有向無環圖 (DAGs)。我先指出一點...嗯...其實軟體本身就是一個定向圖。我們以前會用流程圖來表示程式,不是沒有原因的。
Around 20 years ago, we started to see DAG orchestrators become popular. We're talking classics like Airflow, Prefect, some predecessors, and some newer ones like (dagster, inggest, windmill). These followed the same graph pattern, with the added benefit of observability, modularity, retries, administration, etc.
大概二十年前吧,我們開始看到像 DAG orchestrators 這種東西變得超夯的。經典款有 Airflow、Prefect,還有一些更早的,當然也有比較新的,像是 dagster、inggest、windmill 這些。它們都是沿著那個圖狀模式跑,但多了像是可觀測性、模組化、重試功能、管理之類的好處。
I'm not the first person to say this, but my biggest takeaway when I started learning about agents, was that you get to throw the DAG away. Instead of software engineers coding each step and edge case, you can give the agent a goal and a set of transitions:
這點可能不是我第一個講,但當我開始學 agents 時,最大的收穫就是,你終於可以把那個 DAG 丟掉了!以前軟體工程師得一步一步,把所有奇奇怪怪的邊緣狀況都寫死,現在你可以直接給 agent 一個目標,跟一些狀態轉換的規則就好:
And let the LLM make decisions in real time to figure out the path
然後讓 LLM 即時決定怎麼走
The promise here is that you write less software, you just give the LLM the "edges" of the graph and let it figure out the nodes. You can recover from errors, you can write less code, and you may find that LLMs find novel solutions to problems.
這裡的重點就是,你可以少寫很多程式碼,只要給 LLM 一些圖的「邊」,讓它自己去想出那些「節點」。這樣遇到錯誤可以輕鬆救回來,程式碼也寫得比較少,說不定 LLMs 還能找到你想不到的新方法來解決問題喔。
As we'll see later, it turns out this doesn't quite work.
我們等一下就會看到,這其實不太靈光。
Let's dive one step deeper - with agents you've got this loop consisting of 3 steps:
咱們再深入一層來看看,agent 的核心呢,就是一個由這三個步驟組成的循環:
- LLM determines the next step in the workflow, outputting structured json ("tool calling")
LLM 先決定下一個步驟要做啥,然後輸出結構化的 JSON (這就是所謂的 "tool calling") - Deterministic code executes the tool call
接著就換固定死的程式碼來執行這個 tool call - The result is appended to the context window
執行完的結果會被加到 context window 裡面 - Repeat until the next step is determined to be "done"
然後就一直重複,直到決定下一個步驟是「搞定了」為止
initial_event = {"message": "..."}
context = [initial_event]
while True:
next_step = await llm.determine_next_step(context)
context.append(next_step)
if (next_step.intent === "done"):
return next_step.final_answer
result = await execute_step(next_step)
context.append(result)
Our initial context is just the starting event (maybe a user message, maybe a cron fired, maybe a webhook, etc), and we ask the llm to choose the next step (tool) or to determine that we're done.
一開始就是設定好起始事件(可能是使用者發來的訊息、cron 觸發的排程,或是 webhook 等等),然後就讓 LLM 去決定下一步要做什麼(要用哪個工具),或者判斷整個流程是不是已經結束了。
Here's a multi-step example:
這邊有個多步驟的範例:
027-agent-loop-animation.mp4
At the end of the day, this approach just doesn't work as well as we want it to.
說到底,這個方法就是不如我們預期的好用啊。
In building HumanLayer, I've talked to at least 100 SaaS builders (mostly technical founders) looking to make their existing product more agentic. The journey usually goes something like:
在打造 HumanLayer 的過程中,我至少跟上百位 SaaS 產品的開發者(大部分是技術背景的創辦人)聊過,他們都想讓自己現有的產品「更像個 agent」。這個過程大概都會經歷這些階段:
- Decide you want to build an agent
決定要開發一個 agent - Product design, UX mapping, what problems to solve
做產品設計、UX mapping,思考要解決哪些問題 - Want to move fast, so grab $FRAMEWORK and get to building
想快速推進,所以抓個 $FRAMEWORK 就開始寫程式 - Get to 70-80% quality bar
做到大概七八成的品質 - Realize that 80% isn't good enough for most customer-facing features
發現這個品質對大多數給客戶用的功能來說根本不夠 - Realize that getting past 80% requires reverse-engineering the framework, prompts, flow, etc.
這時候才發現,要突破八成品質,就得開始逆向工程 $FRAMEWORK 的原理、prompt 的設計、整個流程等等 - Start over from scratch
最後只好砍掉重練
Random Disclaimers 隨機免責聲明
DISCLAIMER: I'm not sure the exact right place to say this, but here seems as good as any: this in BY NO MEANS meant to be a dig on either the many frameworks out there, or the pretty dang smart people who work on them. They enable incredible things and have accelerated the AI ecosystem.
I hope that one outcome of this post is that agent framework builders can learn from the journeys of myself and others, and make frameworks even better.
Especially for builders who want to move fast but need deep control.
DISCLAIMER 2: I'm not going to talk about MCP. I'm sure you can see where it fits in.
DISCLAIMER 3: I'm using mostly typescript, for reasons but all this stuff works in python or any other language you prefer.
Anyways back to the thing...
After digging through hundreds of AI libriaries and working with dozens of founders, my instinct is this:
挖遍了數百個 AI 函式庫,也跟一堆創辦人合作過後,我的直覺是這樣:
- There are some core things that make agents great
好的 agents 都有一些核心的特質 - Going all in on a framework and building what is essentially a greenfield rewrite may be counter-productive
一股腦兒全押寶在某個框架上,然後從頭到尾整個砍掉重練,搞不好反而事倍功半咧 - There are some core principles that make agents great, and you will get most/all of them if you pull in a framework
有些核心原則讓 agent 變得超棒,如果你用框架來做,大多或全部都可以做到。 - BUT, the fastest way I've seen for builders to get high-quality AI software in the hands of customers is to take small, modular concepts from agent building, and incorporate them into their existing product
但是,我見過最快能讓開發者將高品質的 AI 軟體交到客戶手上的方法,是從 agent 開發中提取出小巧、模組化的概念,然後將它們融入到現有的產品中。 - These modular concepts from agents can be defined and applied by most skilled software engineers, even if they don't have an AI background
這些從 agent 來的模組化概念,即使是沒有 AI 背景的熟練軟體工程師,也都能定義和應用
- How We Got Here: A Brief History of Software
我們是怎麼走到這一步的:軟體的簡史 - Factor 1: Natural Language to Tool Calls
因素一:從自然語言到工具呼叫 - Factor 2: Own your prompts
原則二:掌握你的提示 (Prompts) - Factor 3: Own your context window
原則三:掌握你的上下文視窗 (Context Window) - Factor 4: Tools are just structured outputs
原則四:工具 (Tools) 其實就是結構化輸出 - Factor 5: Unify execution state and business state
原則五:統一執行狀態和商業狀態 - Factor 6: Launch/Pause/Resume with simple APIs
原則六:用簡單的 APIs 實現啟動、暫停、恢復 - Factor 7: Contact humans with tool calls
原則七:透過工具呼叫 (Tool Calls) 和人類溝通 - Factor 8: Own your control flow
原則八:掌握你的控制流程 (Control Flow) - Factor 9: Compact Errors into Context Window
因素九:將錯誤資訊濃縮進 Context Window - Factor 10: Small, Focused Agents
因素 10:小巧、專注的 Agents - Factor 11: Trigger from anywhere, meet users where they are
- Factor 12: Make your agent a stateless reducer
- Contribute to this guide here
- I talked about a lot of this on an episode of the Tool Use podcast in March 2025
- I write about some of this stuff at The Outer Loop
- I do webinars about Maximizing LLM Performance with @hellovai
- We build OSS agents with this methodology under got-agents/agents
- We ignored all our own advice and built a framework for running distributed agents in kubernetes
- Other links from this guide:
- 12 Factor Apps
- Building Effective Agents (Anthropic)
- Prompts are Functions
- Library patterns: Why frameworks are evil
- The Wrong Abstraction
- Mailcrew Agent
- Mailcrew Demo Video
- Chainlit Demo
- TypeScript for LLMs
- Schema Aligned Parsing
- Function Calling vs Structured Outputs vs JSON Mode
- BAML on GitHub
- OpenAI JSON vs Function Calling
- Outer Loop Agents
- Airflow
- Prefect
- Dagster
- Inngest
- Windmill
- The AI Agent Index (MIT)
- NotebookLM on Finding Model Capability Boundaries
Thanks to everyone who has contributed to 12-factor agents!
This is the current version of 12-factor agents, version 1.0. There is a draft of version 1.1 on the v1.1 branch. There are a few Issues to track work on v1.1.