Agents

When building AI applications, you often need systems that can understand context and take meaningful actions. When building these systems, the key consideration is finding the right balance between flexibility and control. Let's explore different approaches and patterns for building these systems, with a focus on helping you match capabilities to your needs.
在構建 AI 應用程式時，您經常需要能理解上下文並採取有意義行動的系統。建構這些系統時，關鍵考量是在靈活性與控制力之間找到適當平衡。我們將探討建置這些系統的不同方法與模式，重點在幫助您根據需求匹配功能。

Building Blocks

When building AI systems, you can combine these fundamental components:
建構 AI 系統時，您可以組合這些基礎元件：

Single-Step LLM Generation

The basic building block - one call to an LLM to get a response. Useful for straightforward tasks like classification or text generation.
基礎建構區塊 - 對LLM進行單次呼叫以獲得回應。適用於分類或文本生成等直接任務。

Tool Usage

Enhanced capabilities through tools (like calculators, APIs, or databases) that the LLM can use to accomplish tasks. Tools provide a controlled way to extend what the LLM can do.
透過工具（如計算機、API 或資料庫）增強能力，LLM可利用這些工具完成任務。工具提供受控方式來擴展LLM的功能。

When solving complex problems, an LLM can make multiple tool calls across multiple steps without you explicity specifying the order - for example, looking up information in a database, using that to make calculations, and then storing results. The AI SDK makes this multi-step tool usage straightforward through the maxSteps parameter.
解決複雜問題時，LLM可跨多個步驟進行多次工具呼叫，而無需您明確指定順序 - 例如查詢資料庫、利用結果進行計算，然後儲存結果。AI SDK 透過 maxSteps 參數使這種多步驟工具使用變得簡單。

Multi-Agent Systems

Multiple LLMs working together, each specialized for different aspects of a complex task. This enables sophisticated behaviors while keeping individual components focused.
多個LLMs協同工作，各自專注於複雜任務的不同面向。這能在保持各元件專注的同時實現複雜行為。

Patterns

These building blocks can be combined with workflow patterns that help manage complexity:
這些建構區塊可與有助於管理複雜度的工作流程模式結合：

Sequential Processing - Steps executed in order
順序處理 - 依序執行的步驟
Parallel Processing - Independent tasks run simultaneously
平行處理 - 同時執行的獨立任務
Evaluation/Feedback Loops - Results checked and improved iteratively
評估/反饋循環 - 迭代檢查與改進結果
Orchestration - Coordinating multiple components
協調 - 統籌多個元件
Routing - Directing work based on context
路由 - 根據上下文導向工作

Choosing Your Approach

The key factors to consider:
需考量的關鍵因素：

Flexibility vs Control - How much freedom does the LLM need vs how tightly must you constrain its actions?
靈活性 vs 控制力 - LLM需要多少自由度 vs 您需多嚴格約束其行動？
Error Tolerance - What are the consequences of mistakes in your use case?
錯誤容忍度 - 在您的使用案例中，錯誤的後果為何？
Cost Considerations - More complex systems typically mean more LLM calls and higher costs
成本考量 - 更複雜的系統通常意味著更多LLM呼叫與更高成本
Maintenance - Simpler architectures are easier to debug and modify
維護性 - 較簡單的架構更易於除錯與修改

Start with the simplest approach that meets your needs. Add complexity only when required by:
從最簡單且符合需求的方法開始。僅在以下情況時才增加複雜度：

Breaking down tasks into clear steps
將任務分解為明確步驟
Adding tools for specific capabilities
為特定功能添加工具
Implementing feedback loops for quality control
實施品質控制的反饋循環
Introducing multiple agents for complex workflows
為複雜工作流程引入多個代理

Let's look at examples of these patterns in action.
讓我們看看這些模式的實際範例。

Patterns with Examples

The following patterns, adapted from Anthropic's guide on building effective agents, serve as building blocks that can be combined to create comprehensive workflows. Each pattern addresses specific aspects of task execution, and by combining them thoughtfully, you can build reliable solutions for complex problems.
以下模式改編自 Anthropic 關於建構有效代理的指南，可作為建構區塊組合創建完整工作流程。每個模式針對任務執行的特定面向，透過謹慎組合，您能為複雜問題建構可靠解決方案。

Sequential Processing (Chains)

The simplest workflow pattern executes steps in a predefined order. Each step's output becomes input for the next step, creating a clear chain of operations. This pattern is ideal for tasks with well-defined sequences, like content generation pipelines or data transformation processes.
最簡單的工作流程模式按預定順序執行步驟。每個步驟的輸出成為下一步的輸入，形成清晰的操作鏈。此模式非常適合具有明確定義序列的任務，如內容生成管道或資料轉換流程。

import { openai } from '@ai-sdk/openai';
import { generateText, generateObject } from 'ai';
import { z } from 'zod';

async function generateMarketingCopy(input: string) {
  const model = openai('gpt-4o');

  // First step: Generate marketing copy
  const { text: copy } = await generateText({
    model,
    prompt: `Write persuasive marketing copy for: ${input}. Focus on benefits and emotional appeal.`,
  });

  // Perform quality check on copy
  const { object: qualityMetrics } = await generateObject({
    model,
    schema: z.object({
      hasCallToAction: z.boolean(),
      emotionalAppeal: z.number().min(1).max(10),
      clarity: z.number().min(1).max(10),
    }),
    prompt: `Evaluate this marketing copy for:
    1. Presence of call to action (true/false)
    2. Emotional appeal (1-10)
    3. Clarity (1-10)

    Copy to evaluate: ${copy}`,
  });

  // If quality check fails, regenerate with more specific instructions
  if (
    !qualityMetrics.hasCallToAction ||
    qualityMetrics.emotionalAppeal < 7 ||
    qualityMetrics.clarity < 7
  ) {
    const { text: improvedCopy } = await generateText({
      model,
      prompt: `Rewrite this marketing copy with:
      ${!qualityMetrics.hasCallToAction ? '- A clear call to action' : ''}
      ${qualityMetrics.emotionalAppeal < 7 ? '- Stronger emotional appeal' : ''}
      ${qualityMetrics.clarity < 7 ? '- Improved clarity and directness' : ''}

      Original copy: ${copy}`,
    });
    return { copy: improvedCopy, qualityMetrics };
  }

  return { copy, qualityMetrics };
}

Routing

This pattern allows the model to make decisions about which path to take through a workflow based on context and intermediate results. The model acts as an intelligent router, directing the flow of execution between different branches of your workflow. This is particularly useful when handling varied inputs that require different processing approaches. In the example below, the results of the first LLM call change the properties of the second LLM call like model size and system prompt.
此模式讓模型能根據上下文和中間結果，決定工作流程中該採取哪條路徑。模型扮演智能路由器的角色，在工作流程的不同分支間引導執行流程。這在處理需要不同處理方法的多樣輸入時特別有用。在以下範例中，第一次 LLM 呼叫的結果會改變第二次 LLM 呼叫的屬性，例如模型大小和系統提示。

import { openai } from '@ai-sdk/openai';
import { generateObject, generateText } from 'ai';
import { z } from 'zod';

async function handleCustomerQuery(query: string) {
  const model = openai('gpt-4o');

  // First step: Classify the query type
  const { object: classification } = await generateObject({
    model,
    schema: z.object({
      reasoning: z.string(),
      type: z.enum(['general', 'refund', 'technical']),
      complexity: z.enum(['simple', 'complex']),
    }),
    prompt: `Classify this customer query:
    ${query}

    Determine:
    1. Query type (general, refund, or technical)
    2. Complexity (simple or complex)
    3. Brief reasoning for classification`,
  });

  // Route based on classification
  // Set model and system prompt based on query type and complexity
  const { text: response } = await generateText({
    model:
      classification.complexity === 'simple'
        ? openai('gpt-4o-mini')
        : openai('o3-mini'),
    system: {
      general:
        'You are an expert customer service agent handling general inquiries.',
      refund:
        'You are a customer service agent specializing in refund requests. Follow company policy and collect necessary information.',
      technical:
        'You are a technical support specialist with deep product knowledge. Focus on clear step-by-step troubleshooting.',
    }[classification.type],
    prompt: query,
  });

  return { response, classification };
}

Parallel Processing

Some tasks can be broken down into independent subtasks that can be executed simultaneously. This pattern takes advantage of parallel execution to improve efficiency while maintaining the benefits of structured workflows. For example, analyzing multiple documents or processing different aspects of a single input concurrently (like code review).
某些任務可以分解為可同時執行的獨立子任務。此模式利用平行執行來提升效率，同時保持結構化工作流程的優勢。例如：同時分析多份文件，或並行處理單一輸入的不同面向（如程式碼審查）。

import { openai } from '@ai-sdk/openai';
import { generateText, generateObject } from 'ai';
import { z } from 'zod';

// Example: Parallel code review with multiple specialized reviewers
async function parallelCodeReview(code: string) {
  const model = openai('gpt-4o');

  // Run parallel reviews
  const [securityReview, performanceReview, maintainabilityReview] =
    await Promise.all([
      generateObject({
        model,
        system:
          'You are an expert in code security. Focus on identifying security vulnerabilities, injection risks, and authentication issues.',
        schema: z.object({
          vulnerabilities: z.array(z.string()),
          riskLevel: z.enum(['low', 'medium', 'high']),
          suggestions: z.array(z.string()),
        }),
        prompt: `Review this code:
      ${code}`,
      }),

      generateObject({
        model,
        system:
          'You are an expert in code performance. Focus on identifying performance bottlenecks, memory leaks, and optimization opportunities.',
        schema: z.object({
          issues: z.array(z.string()),
          impact: z.enum(['low', 'medium', 'high']),
          optimizations: z.array(z.string()),
        }),
        prompt: `Review this code:
      ${code}`,
      }),

      generateObject({
        model,
        system:
          'You are an expert in code quality. Focus on code structure, readability, and adherence to best practices.',
        schema: z.object({
          concerns: z.array(z.string()),
          qualityScore: z.number().min(1).max(10),
          recommendations: z.array(z.string()),
        }),
        prompt: `Review this code:
      ${code}`,
      }),
    ]);

  const reviews = [
    { ...securityReview.object, type: 'security' },
    { ...performanceReview.object, type: 'performance' },
    { ...maintainabilityReview.object, type: 'maintainability' },
  ];

  // Aggregate results using another model instance
  const { text: summary } = await generateText({
    model,
    system: 'You are a technical lead summarizing multiple code reviews.',
    prompt: `Synthesize these code review results into a concise summary with key actions:
    ${JSON.stringify(reviews, null, 2)}`,
  });

  return { reviews, summary };
}

Orchestrator-Worker

In this pattern, a primary model (orchestrator) coordinates the execution of specialized workers. Each worker is optimized for a specific subtask, while the orchestrator maintains overall context and ensures coherent results. This pattern excels at complex tasks requiring different types of expertise or processing.
在此模式中，主要模型（協調器）負責協調專業工作節點的執行。每個工作節點都針對特定子任務進行優化，而協調器則維護整體上下文並確保結果連貫性。此模式特別擅長處理需要不同專業知識或處理類型的複雜任務。

import { openai } from '@ai-sdk/openai';
import { generateObject } from 'ai';
import { z } from 'zod';

async function implementFeature(featureRequest: string) {
  // Orchestrator: Plan the implementation
  const { object: implementationPlan } = await generateObject({
    model: openai('o3-mini'),
    schema: z.object({
      files: z.array(
        z.object({
          purpose: z.string(),
          filePath: z.string(),
          changeType: z.enum(['create', 'modify', 'delete']),
        }),
      ),
      estimatedComplexity: z.enum(['low', 'medium', 'high']),
    }),
    system:
      'You are a senior software architect planning feature implementations.',
    prompt: `Analyze this feature request and create an implementation plan:
    ${featureRequest}`,
  });

  // Workers: Execute the planned changes
  const fileChanges = await Promise.all(
    implementationPlan.files.map(async file => {
      // Each worker is specialized for the type of change
      const workerSystemPrompt = {
        create:
          'You are an expert at implementing new files following best practices and project patterns.',
        modify:
          'You are an expert at modifying existing code while maintaining consistency and avoiding regressions.',
        delete:
          'You are an expert at safely removing code while ensuring no breaking changes.',
      }[file.changeType];

      const { object: change } = await generateObject({
        model: openai('gpt-4o'),
        schema: z.object({
          explanation: z.string(),
          code: z.string(),
        }),
        system: workerSystemPrompt,
        prompt: `Implement the changes for ${file.filePath} to support:
        ${file.purpose}

        Consider the overall feature context:
        ${featureRequest}`,
      });

      return {
        file,
        implementation: change,
      };
    }),
  );

  return {
    plan: implementationPlan,
    changes: fileChanges,
  };
}

Evaluator-Optimizer

This pattern introduces quality control into workflows by having dedicated evaluation steps that assess intermediate results. Based on the evaluation, the workflow can either proceed, retry with adjusted parameters, or take corrective action. This creates more robust workflows capable of self-improvement and error recovery.
此模式透過設置專門的評估步驟來檢驗中間結果，從而將品質管控導入工作流程中。根據評估結果，工作流程可選擇繼續執行、調整參數後重試，或採取修正措施。如此能建立更具韌性的工作流程，具備自我優化與錯誤復原能力。

import { openai } from '@ai-sdk/openai';
import { generateText, generateObject } from 'ai';
import { z } from 'zod';

async function translateWithFeedback(text: string, targetLanguage: string) {
  let currentTranslation = '';
  let iterations = 0;
  const MAX_ITERATIONS = 3;

  // Initial translation
  const { text: translation } = await generateText({
    model: openai('gpt-4o-mini'), // use small model for first attempt
    system: 'You are an expert literary translator.',
    prompt: `Translate this text to ${targetLanguage}, preserving tone and cultural nuances:
    ${text}`,
  });

  currentTranslation = translation;

  // Evaluation-optimization loop
  while (iterations < MAX_ITERATIONS) {
    // Evaluate current translation
    const { object: evaluation } = await generateObject({
      model: openai('gpt-4o'), // use a larger model to evaluate
      schema: z.object({
        qualityScore: z.number().min(1).max(10),
        preservesTone: z.boolean(),
        preservesNuance: z.boolean(),
        culturallyAccurate: z.boolean(),
        specificIssues: z.array(z.string()),
        improvementSuggestions: z.array(z.string()),
      }),
      system: 'You are an expert in evaluating literary translations.',
      prompt: `Evaluate this translation:

      Original: ${text}
      Translation: ${currentTranslation}

      Consider:
      1. Overall quality
      2. Preservation of tone
      3. Preservation of nuance
      4. Cultural accuracy`,
    });

    // Check if quality meets threshold
    if (
      evaluation.qualityScore >= 8 &&
      evaluation.preservesTone &&
      evaluation.preservesNuance &&
      evaluation.culturallyAccurate
    ) {
      break;
    }

    // Generate improved translation based on feedback
    const { text: improvedTranslation } = await generateText({
      model: openai('gpt-4o'), // use a larger model
      system: 'You are an expert literary translator.',
      prompt: `Improve this translation based on the following feedback:
      ${evaluation.specificIssues.join('\n')}
      ${evaluation.improvementSuggestions.join('\n')}

      Original: ${text}
      Current Translation: ${currentTranslation}`,
    });

    currentTranslation = improvedTranslation;
    iterations++;
  }

  return {
    finalTranslation: currentTranslation,
    iterationsRequired: iterations,
  };
}

Multi-Step Tool Usage

If your use case involves solving problems where the solution path is poorly defined or too complex to map out as a workflow in advance, you may want to provide the LLM with a set of lower-level tools and allow it to break down the task into small pieces that it can solve on its own iteratively, without discrete instructions. To implement this kind of agentic pattern, you need to call an LLM in a loop until a task is complete. The AI SDK makes this simple with the maxSteps parameter.
若您的使用情境涉及解決解決路徑不明確或過於複雜而無法預先規劃工作流程的問題，您可能會想提供LLM一組低階工具，讓它能將任務拆解成可自行迭代解決的小片段，而無需明確指令。要實作這類代理模式，您需要在迴圈中呼叫LLM直到任務完成。AI SDK 透過 maxSteps 參數讓這項操作變得簡單。

With maxSteps, the AI SDK automatically triggers an additional request to the model after every tool result (each request is considered a "step"). If the model does not generate a tool call or the maxSteps threshold has been met, the generation is complete.
使用 maxSteps 時，AI SDK 會在每個工具結果後自動觸發對模型的額外請求（每個請求視為一個「步驟」）。若模型未產生工具調用或達到 maxSteps 閾值，即完成生成。

maxSteps can be used with both generateText and streamText
maxSteps 可同時與 generateText 和 streamText 搭配使用

Using maxSteps

This example demonstrates how to create an agent that solves math problems. It has a calculator tool (using math.js) that it can call to evaluate mathematical expressions.
此範例展示如何建立一個能解決數學問題的代理程式。它配備了一個計算機工具（使用 math.js），可呼叫此工具來評估數學運算式。

import { openai } from '@ai-sdk/openai';
import { generateText, tool } from 'ai';
import * as mathjs from 'mathjs';
import { z } from 'zod';

const { text: answer } = await generateText({
  model: openai('gpt-4o-2024-08-06', { structuredOutputs: true }),
  tools: {
    calculate: tool({
      description:
        'A tool for evaluating mathematical expressions. ' +
        'Example expressions: ' +
        "'1.2 * (2 + 4.5)', '12.7 cm to inch', 'sin(45 deg) ^ 2'.",
      parameters: z.object({ expression: z.string() }),
      execute: async ({ expression }) => mathjs.evaluate(expression),
    }),
  },
  maxSteps: 10,
  system:
    'You are solving math problems. ' +
    'Reason step by step. ' +
    'Use the calculator when necessary. ' +
    'When you give the final answer, ' +
    'provide an explanation for how you arrived at it.',
  prompt:
    'A taxi driver earns $9461 per 1-hour of work. ' +
    'If he works 12 hours a day and in 1 hour ' +
    'he uses 12 liters of petrol with a price  of $134 for 1 liter. ' +
    'How much money does he earn in one day?',
});

console.log(`ANSWER: ${answer}`);

Structured Answers

When building an agent for tasks like mathematical analysis or report generation, it's often useful to have the agent's final output structured in a consistent format that your application can process. You can use an answer tool and the toolChoice: 'required' setting to force the LLM to answer with a structured output that matches the schema of the answer tool. The answer tool has no execute function, so invoking it will terminate the agent.
在構建用於數學分析或報告生成等任務的代理程式時，讓代理程式的最終輸出採用應用程式可處理的一致格式通常很有用。您可以使用回答工具和 toolChoice: 'required' 設定來強制 LLM 產生符合回答工具架構的結構化輸出。回答工具沒有 execute 功能，因此調用它將終止代理程式。

import { openai } from '@ai-sdk/openai';
import { generateText, tool } from 'ai';
import 'dotenv/config';
import { z } from 'zod';

const { toolCalls } = await generateText({
  model: openai('gpt-4o-2024-08-06', { structuredOutputs: true }),
  tools: {
    calculate: tool({
      description:
        'A tool for evaluating mathematical expressions. Example expressions: ' +
        "'1.2 * (2 + 4.5)', '12.7 cm to inch', 'sin(45 deg) ^ 2'.",
      parameters: z.object({ expression: z.string() }),
      execute: async ({ expression }) => mathjs.evaluate(expression),
    }),
    // answer tool: the LLM will provide a structured answer
    answer: tool({
      description: 'A tool for providing the final answer.',
      parameters: z.object({
        steps: z.array(
          z.object({
            calculation: z.string(),
            reasoning: z.string(),
          }),
        ),
        answer: z.string(),
      }),
      // no execute function - invoking it will terminate the agent
    }),
  },
  toolChoice: 'required',
  maxSteps: 10,
  system:
    'You are solving math problems. ' +
    'Reason step by step. ' +
    'Use the calculator when necessary. ' +
    'The calculator can only do simple additions, subtractions, multiplications, and divisions. ' +
    'When you give the final answer, provide an explanation for how you got it.',
  prompt:
    'A taxi driver earns $9461 per 1-hour work. ' +
    'If he works 12 hours a day and in 1 hour he uses 14-liters petrol with price $134 for 1-liter. ' +
    'How much money does he earn in one day?',
});

console.log(`FINAL TOOL CALLS: ${JSON.stringify(toolCalls, null, 2)}`);

You can also use the experimental_output setting for generateText to generate structured outputs.
您也可以使用 experimental_output 設定中的 generateText 來生成結構化輸出。

Accessing all steps

Calling generateText with maxSteps can result in several calls to the LLM (steps). You can access information from all steps by using the steps property of the response.
呼叫 generateText 並傳入 maxSteps 可能會導致多次呼叫 LLM（步驟）。您可以藉由回應中的 steps 屬性存取所有步驟的資訊。

import { generateText } from 'ai';

const { steps } = await generateText({
  model: openai('gpt-4o'),
  maxSteps: 10,
  // ...
});

// extract all tool calls from the steps:
const allToolCalls = steps.flatMap(step => step.toolCalls);

Getting notified on each completed step

You can use the onStepFinish callback to get notified on each completed step. It is triggered when a step is finished, i.e. all text deltas, tool calls, and tool results for the step are available.
您可以使用 onStepFinish 回調來獲取每個完成步驟的通知。當一個步驟完成時（即該步驟的所有文本增量、工具調用和工具結果都可用時），就會觸發此回調。

import { generateText } from 'ai';

const result = await generateText({
  model: yourModel,
  maxSteps: 10,
  onStepFinish({ text, toolCalls, toolResults, finishReason, usage }) {
    // your own logic, e.g. for saving the chat history or recording usage
  },
  // ...
});