openai agents SDK原理详解

文章目录

openai agents开发新套件：Responses API和Agents SDK
- Responses API⁠
agents SDK
- Guardrails: 智能体安全护栏
- - 输入防护栏
  - 输出防护栏
- Tracing：智能体行为观测追踪
- - trace
  - span
  - processors
- 使用示例：创建辅导孩子写作业的多个智能体教师
REF

agentsResponses_APIAgents_SDK_1">openai agents开发新套件：Responses API和Agents SDK

上周manus大火，这周openai也坐不住了。3月12日凌晨，openai推出了新的agent开发套件，包括网页搜索、文件搜索、电脑使用、Responses API等，以及Agents SDK。
在这里插入图片描述

Responses API⁠

Responses API 是 OpenAI 推出的一种新的 API 原语，它结合了 Chat Completions API 的简洁性和 Assistants API 的工具使用能力，旨在帮助开发者更轻松地利用 OpenAI 的内置工具构建agents.
新增Responses API⁠的Built-in tools包括web search、file search、computer use。

Web Search：允许模型通过网络搜索获取最新信息，支持 gpt-4o 和 gpt-4o-mini 模型。
File Search：支持从大量文档中检索相关信息，支持多种文件类型、查询优化、元数据过滤和自定义重排。
Computer Use：基于Computer-Using Agent (CUA) 模型，允许开发者通过模拟鼠标和键盘操作来自动化计算机任务，例如浏览器自动化、数据录入等。

web search

const response = await openai.responses.create({model: "gpt-4o",tools: [ { type: "web_search_preview" } ],input: "What was a positive news story that happened today?",
});console.log(response.output_text);

Web Search 的性能由与 ChatGPT 搜索相同的模型驱动。在 SimpleQA 基准测试中，GPT-4o search preview 和 GPT-4o mini search preview 分别达到了 90% 和 88% 的准确率。
在这里插入图片描述

File search
开发者现在可以利用改进后的文件搜索工具，轻松从海量文档中检索相关信息。该工具支持多种文件类型、查询优化、元数据过滤以及自定义重排，能够快速且准确地返回搜索结果。js的调用代码为：

const productDocs = await openai.vectorStores.create({name: "Product Documentation",file_ids: [file1.id, file2.id, file3.id],
});const response = await openai.responses.create({model: "gpt-4o-mini",tools: [{type: "file_search",vector_store_ids: [productDocs.id],}],input: "What is deep research by OpenAI?",
});console.log(response.output_text);

computer use
Computer Use tool是responses API内置的计算机使用工具，能够捕捉模型生成的鼠标和键盘操作，并将其直接转换为开发者环境中的可执行命令，从而实现计算机任务的自动化。
js调用代码为：

const response = await openai.responses.create({model: "computer-use-preview",tools: [{type: "computer_use_preview", # 说明工具类型display_width: 1024,display_height: 768,environment: "browser",}],truncation: "auto",input: "I'm looking for a new camera. Help me find the best one.",
});console.log(response.output);

该工具由支持 Operator 的相同模型——计算机使用代理（Computer-Using Agent, CUA）模型驱动。这一研究预览模型创造了新的最高水平记录：在 OSWorld上成功率达到 38.1%，在 WebArena 上达到 58.1%，以及在 WebVoyager上达到 87%。
在这里插入图片描述

agents_SDK_66">agents SDK

agents SDK是一个多智能体工作流协作的开源框架，可视为openai去年发布的swarm框架的升级版。
swarm框架的源码讲解可以查看这篇文章OpenAI Swarm框架源码详解及案例实战。

Agents SDK 在设计时受到社区中其他优秀项目（如 Pydantic、Griffe 和 MkDocs）的启发。该框架与 openai的Responses API 和 Chat Completions API 兼容。

文档地址：https://openai.github.io/openai-agents-python/
代码地址：https://github.com/openai/openai-agents-python

相较于swarm，agents SDK主要新增了两个功能：

防护栏（Guardrails）：
提供可配置的安全检查，用于输入和输出验证，确保agents行为符合安全和合规要求。
追踪与可观测性（Tracing & Observability）：
可视化agents的执行轨迹，帮助开发者调试和优化性能，确保工作流的高效运行。

Guardrails: 智能体安全护栏

guardrails的实现在https://github.com/openai/openai-agents-python/blob/main/src/agents/guardrail.py中，包含了智能体输入护栏和输出护栏两类防护机制。
防护栏的输出结果用GuardrailFunctionOutput类封装，该类包含输入信息output_info和防护栏是否被触发 tripwire_triggered两个方法。若防护栏被触发，智能体的执行将会挂起。

@dataclass
class GuardrailFunctionOutput:"""The output of a guardrail function."""output_info: Any"""Optional information about the guardrail's output. For example, the guardrail could includeinformation about the checks it performed and granular results."""tripwire_triggered: bool"""Whether the tripwire was triggered. If triggered, the agent's execution will be halted."""

输入防护栏

输入防护栏应用于最初的智能体，分三个运行防护。首先，护栏接收传递给agent的相同输入。
接下来，运行护栏函数以生成GuardrailFunctionOutput，然后将其包装在InputGuardrailResult中
最后，检查.tripwire_triggered是否为真。如果为true，则会引发InputGuardrailTripwireTriggered异常，因此可以适当地响应用户或处理异常。
输入防护栏的功能具体有三个类实现。
(1) InputGuardrail：类，定义输入防护栏，包含防护栏函数和名称，用于执行输入检查。
InputGuardrail类使用@dataclass装饰器来简化类的定义，并自动生成常见的方法，如 __init__、__repr__ 和__eq__等。InputGuardrail封装了防护栏函数（guardrail_function）和防护栏的名称（name），并提供了一个 run 方法来执行防护栏函数。

@dataclass
class InputGuardrail(Generic[TContext]):guardrail_function: callable[[RunContextWrapper[TContext], Agent[Any], str | list[TResponseInputItem]],MaybeAwaitable[GuardrailFunctionOutput],]"""A function that receives the the agent input and the context, and returns a`GuardrailResult`. The result marks whether the tripwire was triggered, and can optionallyinclude information about the guardrail's output."""name : str | None = None """The name of the guardrail, used for tracing. If not provided, we'll use the guardrailfunction's name."""def get_name(self) -> str:if self.name:return self.namereturn self.guardrail_function.__name__async def run(self,agent: Agent[Any],input: str | list[TResponseInputItem],context: RunContextWrapper[TContext],)-> InputGuardrailResult:if not callable(self.guardrail_function):raise UserError(f"Guardrail function must be callable, got {self.guardrail_function}")output = self.guardrail_function(context,agent,input)if inspect.isawaitable(output):return InputGuardrailResult(guardrail=self,output = await output,)return InputGuardrailResult(guardrail=self,output=output,)

(2) InputGuardrailResult：数据类，封装输入防护栏的运行结果，便于后续处理。该类的属性包括guardrail(运行的输入防护栏对象)和output(防护栏函数的输出结果)。

@dataclass
class InputGuardrailResult:"""The result of a guardrail run."""guardrail: InputGuardrail[Any]"""The guardrail that was run."""output: GuardrailFunctionOutput"""The output of the guardrail function."""

(3) input_guardrail：装饰器，用于将将普通函数转换为 InputGuardrail 输入防护栏对象，简化防护栏的定义和配置。

def input_guardrail(func: _InputGuardrailFuncSync[TContext_co]| _InputGuardrailFuncAsync[TContext_co]| None = None,*,name: str | None = None,
) -> (InputGuardrail[TContext_co]| Callable[[_InputGuardrailFuncSync[TContext_co] | _InputGuardrailFuncAsync[TContext_co]],InputGuardrail[TContext_co],]
):"""Decorator that transforms a sync or async function into an `InputGuardrail`.It can be used directly (no parentheses) or with keyword args, e.g.:@input_guardraildef my_sync_guardrail(...): ...@input_guardrail(name="guardrail_name")async def my_async_guardrail(...): ..."""def decorator(f: _InputGuardrailFuncSync[TContext_co] | _InputGuardrailFuncAsync[TContext_co],)-> InputGuardrail[TContext_co]:return InputGuardrail(guardrail_function=f,name=name)if func is not None:# Decorator was used without parenthesesreturn decorator(func)# Decorator used with keyword argumentsreturn decorator

input_guardrail装饰器支持同步、异步、带关键字参数三种不同的使用方式，通过@overload提供了三种重载定义：

直接装饰同步函数


# 输入防护栏函数类型别名定义
_InputGuardrailFuncSync = Callable[[RunContextWrapper[TContext_co], "Agent[Any]", Union[str, list[TResponseInputItem]]],GuardrailFunctionOutput,
]
_InputGuardrailFuncAsync = Callable[[RunContextWrapper[TContext_co], "Agent[Any]", Union[str, list[TResponseInputItem]]],Awaitable[GuardrailFunctionOutput],
]@overload
def input_guardrail(func: _InputGuardrailFuncSync[TContext_co],
) -> InputGuardrail[TContext_co]: ...

直接装饰异步函数

@overload
def input_guardrail(func: _InputGuardrailFuncAsync[TContext_co],
) -> InputGuardrail[TContext_co]: ...

使用关键字参数（如@input_guardrail(name=“guardrail_name”)）


@overload
def input_guardrail(*,name: str | None = None,
) -> callable[[_InputGuardrailFuncSync[TContext_co] | _InputGuardrailFuncAsync[TContext_co]],InputGuardrail[TContext_co],
]: ...

输出防护栏

输出防护栏应用于最终的agent.
输出防护栏的工作同样分为三个步骤。首先，护栏接收传递给智能体的相同输入。
接下来，护栏函数运行以生成GuardrailFunctionOutput，然后将其封装在OutputGuardrailResult中
最后，检查 .tripwire_triggered 是否为 true。如果为 true，则会引发OutputGuardrailTripwireTriggered 异常，以便可以适当地响应用户或处理该异常。
输出防护栏的三个主要类OutputGuardrail、OutputGuardrailResult和output_guardrail的属性和方法，与输入防护栏类似。

Tracing：智能体行为观测追踪

Tracing模块用于监控和追踪agents的行为，通过在关键事件（如 Span 的开始和结束）触发时调用处理器（Processor）来记录和处理数据。
Tracing模块的核心组件包括trace、span、TracingProcessor和其他工具函数。

Tracing UI追踪agent的行为案例如下图。
在这里插入图片描述

trace

Trace 是追踪的根对象，表示一个完整的逻辑工作流。它记录了从开始到结束的整个流程，并可以包含多个 Span。实现代码在src/agents/tracing/traces.py,包括进入追踪、退出追踪、开始追踪、结束追踪、导出追踪为字典等抽象方法。

class Trace:"""A trace is the root level object that tracing creates. It represents a logical "workflow"."""@abc.abstractmethoddef __enter__(self) -> Trace:pass@abc.abstractmethoddef __exit__(self, exc_type, exc_val, exc_tb):pass@abc.abstractmethoddef start(self, mark_as_current: bool = False):"""Start the trace.Args:mark_as_current: If true, the trace will be marked as the current trace."""pass@abc.abstractmethoddef finish(self, reset_current: bool = False):"""Finish the trace.Args:reset_current: If true, the trace will be reset as the current trace."""pass@property@abc.abstractmethoddef trace_id(self) -> str:"""The trace ID."""pass@property@abc.abstractmethoddef name(self) -> str:"""The name of the workflow being traced."""pass@abc.abstractmethoddef export(self) -> dict[str, Any] | None:"""Export the trace as a dictionary."""pass

span

Span 表示一个具体的操作或任务。它可以记录操作的开始和结束时间、错误信息以及其他元数据。
Tracing 模块提供了多种函数用于创建不同类型的 Span，例如：
agent_span：用于创建Agent相关的 Span。
custom_span：用于创建自定义 Span。
function_span：用于创建函数调用相关的 Span。
generation_span：用于记录模型生成的详细信息。
response_span：用于记录模型响应信息。
guardrail_span：用于记录防护栏（Guardrail）的触发情况。
handoff_span：用于记录代理之间的交接操作。

processors

processors是一个接口，用于处理 Trace 和 Span 的生命周期事件。它是 Tracing 模块的扩展点，允许开发者自定义数据处理逻辑。

使用示例：创建辅导孩子写作业的多个智能体教师

安装agents sdk:

pip install openai-agents

在开发环境中导入自己的openai api：

export OPENAI_API_KEY=sk-…

创建不同学科的agent、检查输入是否是家庭作业问题的agent,并创建异步式输入检查guardrail：


from agents import Agent, InputGuardrail, GuardrailFunctionOutput, Runner
from pydantic import BaseModel
import asyncio# 定义一个 Pydantic 模型，用于表示作业检查的输出结果
class HomeworkOutput(BaseModel):is_homework: bool  # 是否是作业相关问题reasoning: str  # 判断的依据# 创建一个名为 "Guardrail check" 的agent，用于检查用户是否在询问作业相关问题
guardrail_agent = Agent(name="Guardrail check", instructions="Check if the user is asking about homework.", output_type=HomeworkOutput,  # 输出类型
)# 创建一个名为 "Math Tutor" agent，用于解答数学问题
math_tutor_agent = Agent(name="Math Tutor",  handoff_description="Specialist agent for math questions",  instructions="You provide help with math problems. Explain your reasoning at each step and include examples",  # 指令
)# 创建一个名为 "History Tutor" 的agent，用于解答历史问题
history_tutor_agent = Agent(name="History Tutor",  handoff_description="Specialist agent for historical questions",  instructions="You provide assistance with historical queries. Explain important events and context clearly.",  
)# 定义一个异步函数，用于检查用户输入是否与作业相关
async def homework_guardrail(ctx, agent, input_data):# 使用 Runner 运行 guardrail_agent，并传入输入数据和上下文result = await Runner.run(guardrail_agent, input_data, context=ctx.context)# 将结果的输出转换为 HomeworkOutput 类型final_output = result.final_output_as(HomeworkOutput)# 返回 GuardrailFunctionOutput，包含输出信息和是否触发警报return GuardrailFunctionOutput(output_info=final_output,tripwire_triggered=not final_output.is_homework,  # 如果不是作业相关问题，则触发警报)# 创建一个名为 "Triage Agent" 的agent，用于根据用户的问题选择合适的代理
triage_agent = Agent(name="Triage Agent",instructions="You determine which agent to use based on the user's homework question",  handoffs=[history_tutor_agent, math_tutor_agent],  input_guardrails=[  # 输入防护栏InputGuardrail(guardrail_function=homework_guardrail),  # 使用 homework_guardrail 函数作为防护栏],
)# 定义主函数，用于运行 triage_agent 并打印结果
async def main():# 运行 triage_agent，传入问题 "who was the first president of the united states?"result = await Runner.run(triage_agent, "who was the first president of the united states?")print(result.final_output)  # 打印结果# 运行 triage_agent，传入问题 "what is life"result = await Runner.run(triage_agent, "what is life")print(result.final_output)  # 打印结果# 如果直接运行此脚本，则执行 main 函数
if __name__ == "__main__":asyncio.run(main())

REF

https://openai.com/index/new-tools-for-building-agents/
https://platform.openai.com/docs/guides/agents-sdk
https://x.com/OpenAIDevs/status/1899531225468969240
https://openai.github.io/openai-agents-python/
https://github.com/openai/openai-agents-python
https://openai.github.io/openai-agents-python/ref/guardrail/#agents.guardrail.input_guardrail