这是用户在 2024-9-1 22:47 为 https://platform.openai.com/docs/guides/prompt-engineering 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

This guide shares strategies and tactics for getting better results from large language models (sometimes referred to as GPT models) like GPT-4o. The methods described here can sometimes be deployed in combination for greater effect.
本指南分享了从大型语言模型(有时被称为 GPT 模型)如 GPT-4o 获取更好结果的策略和战术。这里描述的方法有时可以组合使用以获得更大的效果。

We encourage experimentation to find the methods that work best for you.
我们鼓励实验,以找到最适合您的方法。

You can also explore example prompts which showcase what our models are capable of:
您还可以探索示例提示,这些提示展示了我们模型的功能:

Prompt examples 提示示例

Explore prompt examples to learn what GPT models can do
探索提示示例以了解 GPT 模型能做什么

These models can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see.
这些模型无法读懂你的想法。如果输出太长,请求简短的回复。如果输出太简单,请求专家级的写作。如果你不喜欢这个格式,请展示你想要的格式。

The less the model has to guess at what you want, the more likely you’ll get it.
模型需要猜测你想要什么的越少,你得到的可能性就越大。

Tactics: 战术:

Language models can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs.
语言模型可以自信地编造虚假的答案,特别是当被问及晦涩的话题或要求提供引用和网址时。

In the same way that a sheet of notes can help a student do better on a test, providing reference text to these models can help in answering with fewer fabrications.
就像一份笔记可以帮助学生在考试中表现更好一样,为这些模型提供参考文本可以帮助减少虚假信息的产生。

Tactics: 战术:

Just as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks submitted to a language model. Complex tasks tend to have higher error rates than simpler tasks.
正如在软件工程中将复杂系统分解为一组模块化组件是最佳实践一样,提交给语言模型的任务也是如此。复杂的任务往往比简单的任务有更高的错误率。

Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.
此外,复杂的任务通常可以重新定义为一系列更简单任务的工作流程,其中早期任务的输出用于构建后续任务的输入。

Tactics: 策略:

If asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, models make more reasoning errors when trying to answer right away, rather than taking time to work out an answer.
如果你被要求将 17 乘以 28,你可能不会立即知道答案,但仍然可以花时间算出来。类似地,模型在试图立即回答问题时,比花时间算出答案更容易犯推理错误。

Asking for a "chain of thought" before an answer can help the model reason its way toward correct answers more reliably.
在给出答案之前请求“思维链”可以帮助模型更可靠地推理到正确答案。

Tactics: 战术:

Compensate for the weaknesses of the model by feeding it the outputs of other tools. For example, a text retrieval system (sometimes called RAG or retrieval augmented generation) can tell the model about relevant documents.
通过向模型输入其他工具的输出,以弥补模型的不足。例如,文本检索系统(有时称为 RAG 或检索增强生成)可以向模型提供相关文档的信息。

A code execution engine like OpenAI's Code Interpreter can help the model do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a language model, offload it to get the best of both.
像 OpenAI 的代码解释器这样的代码执行引擎可以帮助模型进行数学运算和运行代码。如果一项任务可以通过工具而不是语言模型更可靠或更有效地完成,则将其卸载以获得两者的最佳效果。

Tactics: 战术:

Improving performance is easier if you can measure it. In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples.
如果能够衡量性能,改进性能就更容易。在某些情况下,对提示的修改可能会在一些孤立的例子中取得更好的性能,但会导致在更具代表性的例子集中整体性能下降。

Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite (also known an as an "eval").
因此,为了确保变更对性能是净积极的,可能需要定义一个全面的测试套件(也称为“评估”)。

Tactic: 战术:

Each of the strategies listed above can be instantiated with specific tactics. These tactics are meant to provide ideas for things to try. They are by no means fully comprehensive, and you should feel free to try creative ideas not represented here.
上述列出的每种策略都可以通过具体战术进行实现。这些战术旨在提供一些尝试的思路。它们绝不是完全全面的,你可以自由尝试这里未列出的创意。

In order to get a highly relevant response, make sure that requests provide any important details or context. Otherwise you are leaving it up to the model to guess what you mean.
为了获得高度相关的回复,请确保请求提供任何重要细节或上下文。否则,你将把猜测你的意思的任务留给模型。

Worse 更糟糕Better 更好
How do I add numbers in Excel?
如何在 Excel 中添加数字?
How do I add up a row of dollar amounts in Excel? I want to do this automatically for a whole sheet of rows with all the totals ending up on the right in a column called "Total".
如何在 Excel 中对一行美元金额进行求和?我想为整个工作表中的所有行自动执行此操作,所有总计都放在名为“总计”的右侧列中。
Who’s president? 谁是总统?Who was the president of Mexico in 2021, and how frequently are elections held?
2021 年墨西哥的总统是谁,选举的频率是多少?
Write code to calculate the Fibonacci sequence.
编写代码计算斐波那契数列。
Write a TypeScript function to efficiently calculate the Fibonacci sequence. Comment the code liberally to explain what each piece does and why it's written that way.
编写一个 TypeScript 函数,以高效计算斐波那契数列。对代码进行充分注释,以解释每个部分的功能及其编写原因。
Summarize the meeting notes.
总结会议记录。
Summarize the meeting notes in a single paragraph. Then write a markdown list of the speakers and each of their key points. Finally, list the next steps or action items suggested by the speakers, if any.
将会议记录概括成一段话。然后以 Markdown 列表形式列出发言者及其关键要点。最后,列出发言者提出的下一步行动或待办事项(如果有)。

The system message can be used to specify the persona used by the model in its replies.
系统消息可以用来指定模型在回复中使用的人格。

SYSTEM 系统
When I ask for help to write something, you will reply with a document that contains at least one joke or playful comment in every paragraph.
当我请求帮助写东西时,你会回复一个文档,每段至少包含一个笑话或俏皮的评论。
USER 用户
Write a thank you note to my steel bolt vendor for getting the delivery in on time and in short notice. This made it possible for us to deliver an important order.
感谢您及时且在短时间内完成了螺栓的交付。这使我们能够按时完成一项重要订单。

Delimiters like triple quotation marks, XML tags, section titles, etc. can help demarcate sections of text to be treated differently.
如三重引号、XML 标签、章节标题等分隔符可以帮助划定文本部分,以便进行不同的处理。

USER 用户
Summarize the text delimited by triple quotes with a haiku. """insert text here"""
请用俳句总结三引号之间的文本。 请提供需要翻译的文本
SYSTEM 系统
You will be provided with a pair of articles (delimited with XML tags) about the same topic. First summarize the arguments of each article. Then indicate which of them makes a better argument and explain why.
你将获得一对关于相同主题的文章(用 XML 标签分隔)。首先总结每篇文章的论点。然后指出哪一篇文章的论点更好,并解释原因。
USER 用户
<article> insert first article here </article> <article> insert second article here </article>
<article> 在这里插入第一篇文章 < /article> <article>在此插入第二篇文章</article></article>
SYSTEM 系统
You will be provided with a thesis abstract and a suggested title for it. The thesis title should give the reader a good idea of the topic of the thesis but should also be eye-catching. If the title does not meet these criteria, suggest 5 alternatives.
您将获得一篇论文摘要和一个建议的标题。论文标题应该让读者对论文主题有一个很好的了解,但也应该引人注目。如果标题不符合这些标准,请提出 5 个替代方案。
USER 用户
Abstract: insert abstract here Title: insert title here
摘要:请在此处插入摘要 标题:在此插入标题

For straightforward tasks such as these, using delimiters might not make a difference in the output quality. However, the more complex a task is the more important it is to disambiguate task details. Don’t make the model work to understand exactly what you are asking of them.
对于像这些简单的任务,使用分隔符可能不会影响输出质量。然而,任务越复杂,澄清任务细节就越重要。不要让模型费力去理解你究竟在要求什么。

Some tasks are best specified as a sequence of steps. Writing the steps out explicitly can make it easier for the model to follow them.
有些任务最好指定为一系列步骤。明确地写出这些步骤可以使模型更容易遵循它们。

SYSTEM 系统
Use the following step-by-step instructions to respond to user inputs. Step 1 - The user will provide you with text in triple quotes. Summarize this text in one sentence with a prefix that says "Summary: ".
请使用以下分步说明来响应用户输入。 步骤 1 - 用户将提供用三个引号括起来的文本。用一句话总结此文本,并以“总结:”作为前缀。

Step 2 - Translate the summary from Step 1 into Spanish, with a prefix that says "Translation: ".
步骤 2 - 将步骤 1 中的摘要翻译成西班牙语,并在前面加上“翻译:”。
USER 用户
"""insert text here""" 请插入文本这里

Providing general instructions that apply to all examples is generally more efficient than demonstrating all permutations of a task by example, but in some cases providing examples may be easier.
提供适用于所有示例的一般指导通常比通过示例演示任务的所有排列更有效,但在某些情况下,提供示例可能更容易。

For example, if you intend for the model to copy a particular style of responding to user queries which is difficult to describe explicitly. This is known as "few-shot" prompting.
例如,如果您希望模型复制一种特定类型的用户查询响应方式,而这种方式难以明确描述,那么这被称为“少样本”提示。

SYSTEM 系统
Answer in a consistent style.
以一致的风格回答。
USER 用户
Teach me about patience.  教我如何有耐心。
ASSISTANT 助手
The river that carves the deepest valley flows from a modest spring; the grandest symphony originates from a single note; the most intricate tapestry begins with a solitary thread.
雕刻最深峡谷的河流发源于不起眼的泉水;最宏伟的交响乐起源于单个音符;最复杂的挂毯始于一根孤单的线。
USER 用户
Teach me about the ocean.
教我关于海洋的知识。

You can ask the model to produce outputs that are of a given target length.
您可以要求模型生成具有特定目标长度的输出。

The targeted output length can be specified in terms of the count of words, sentences, paragraphs, bullet points, etc. Note however that instructing the model to generate a specific number of words does not work with high precision.
目标输出长度可以通过单词、句子、段落、项目符号等的数量来指定。然而,请注意,指示模型生成特定数量的单词并不能高精度地实现。

The model can more reliably generate outputs with a specific number of paragraphs or bullet points.
该模型可以更可靠地生成具有特定段落数或项目符号数的输出。

USER 用户
Summarize the text delimited by triple quotes in about 50 words. """insert text here"""
总结以下用三个引号括起来的文本,字数控制在 50 字左右。 Sure, please provide the source text that you would like to have translated into Simplified Chinese
USER 用户
Summarize the text delimited by triple quotes in 2 paragraphs. """insert text here"""
请总结用三个引号分隔的文本,分为两段。 请提供需要翻译的文本
USER 用户
Summarize the text delimited by triple quotes in 3 bullet points. """insert text here"""
将三引号之间的文本总结为 3 个要点。 Sure, please provide the text you'd like to translate

If we can provide a model with trusted information that is relevant to the current query, then we can instruct the model to use the provided information to compose its answer.
如果我们能提供给模型与当前查询相关的可信信息,那么我们可以指示模型利用这些提供的信息来组成它的回答。

SYSTEM 系统
Use the provided articles delimited by triple quotes to answer questions. If the answer cannot be found in the articles, write "I could not find an answer."
使用提供的被三重引号分隔的文章来回答问题。如果在文章中找不到答案,请写“我找不到答案。”
USER 用户
<insert articles, each delimited by triple quotes> Question: <insert question here>
<插入文章,每篇由三重引号分隔> 问题:<insert question here>

Given that all models have limited context windows, we need some way to dynamically lookup information that is relevant to the question being asked. Embeddings can be used to implement efficient knowledge retrieval. See the tactic "Use embeddings-based search to implement efficient knowledge retrieval" for more details on how to implement this.
鉴于所有模型都有有限的上下文窗口,我们需要某种方法来动态查找与所提问题相关的的信息。 嵌入 可用于实现高效的知识检索。有关如何实现此方法的更多详细信息,请参阅策略 "使用基于嵌入的搜索来实现高效的知识检索"

If the input has been supplemented with relevant knowledge, it's straightforward to request that the model add citations to its answers by referencing passages from provided documents.
如果输入已补充相关知识,要求模型在回答中引用提供的文档中的段落来添加引用是很简单的。

Note that citations in the output can then be verified programmatically by string matching within the provided documents.
请注意,输出中的引用可以通过在提供的文档中进行字符串匹配来进行程序化验证。

SYSTEM 系统
You will be provided with a document delimited by triple quotes and a question. Your task is to answer the question using only the provided document and to cite the passage(s) of the document used to answer the question.
您将获得一份由三重引号限定的文件和一个问题。您的任务是仅使用提供的文件来回答问题,并引用用于回答问题的文件段落。

If the document does not contain the information needed to answer this question then simply write: "Insufficient information." If an answer to the question is provided, it must be annotated with a citation.
如果文件中没有回答这个问题所需的信息,请简单地写:“信息不足。”如果提供了问题的答案,必须附上引用。

Use the following format for to cite relevant passages ({"citation": …}).
使用以下格式引用相关段落 ({"citation": …})。
USER 用户
"""<insert document here>""" Question: <insert question here>
"""<在此插入文档>""" 问题:< 在此插入问题>

For tasks in which lots of independent sets of instructions are needed to handle different cases, it can be beneficial to first classify the type of query and to use that classification to determine which instructions are needed.
对于需要大量独立指令集来处理不同情况的任务,首先对查询类型进行分类,然后使用该分类来确定需要哪些指令是很有益的。

This can be achieved by defining fixed categories and hardcoding instructions that are relevant for handling tasks in a given category. This process can also be applied recursively to decompose a task into a sequence of stages.
这可以通过定义固定类别和硬编码与处理给定类别中的任务相关的指令来实现。此过程也可以递归地应用于将任务分解为一系列阶段。

The advantage of this approach is that each query will contain only those instructions that are required to perform the next stage of a task which can result in lower error rates compared to using a single query to perform the whole task.
这种方法的优势在于,每个查询只包含执行任务下一阶段所需的指令,与使用单个查询执行整个任务相比,这可以降低错误率。

This can also result in lower costs since larger prompts cost more to run (see pricing information).
这也可能导致更低的成本,因为较大的提示运行成本更高(查看定价信息)。

Suppose for example that for a customer service application, queries could be usefully classified as follows:
假设例如,对于客户服务应用程序,查询可以有用地分类如下:

SYSTEM 系统
You will be provided with customer service queries. Classify each query into a primary category and a secondary category. Provide your output in json format with the keys: primary and secondary.
您将收到客户服务查询。将每个查询分类到一个主要类别和一个次要类别中。请以 JSON 格式提供您的输出,键为:primary 和 secondary。

Primary categories: Billing, Technical Support, Account Management, or General Inquiry.
主要类别:账单、技术支持、账户管理或一般咨询。

Billing secondary categories: - Unsubscribe or upgrade - Add a payment method - Explanation for charge - Dispute a charge Technical Support secondary categories: - Troubleshooting - Device compatibility - Software updates Account Management secondary categories: - Password reset - Update personal information - Close account - Account security General Inquiry secondary categories: - Product information - Pricing - Feedback - Speak to a human
计费二级类别: - 退订或升级 添加支付方式 - 费用说明 - 争议费用 技术支持二级类别: - 故障排除 - 设备兼容性 - 软件更新 账户管理二级类别: - 重置密码 更新个人信息 关闭账户 账户安全 一般查询二级类别: - 产品信息 - 定价 - 反馈 与人工客服交谈
USER 用户
I need to get my internet working again.
我需要让我的互联网再次正常工作。

Based on the classification of the customer query, a set of more specific instructions can be provided to a model for it to handle next steps. For example, suppose the customer requires help with "troubleshooting".
根据客户查询的分类,可以向模型提供一套更具体的指令,以便它处理后续步骤。例如,假设客户需要关于“故障排除”的帮助。

SYSTEM 系统
You will be provided with customer service inquiries that require troubleshooting in a technical support context. Help the user by: - Ask them to check that all cables to/from the router are connected. Note that it is common for cables to come loose over time.
您将收到需要在技术支持环境中进行故障排除的客户服务咨询。请通过以下方式帮助用户: - 请他们检查所有连接到/从路由器的电缆是否已连接。请注意,电缆随着时间的推移松动是很常见的。

- If all cables are connected and the issue persists, ask them which router model they are using - Now you will advise them how to restart their device: -- If the model number is MTD-327J, advise them to push the red button and hold it for 5 seconds, then wait 5 minutes before testing the connection.
- 如果所有电缆都已连接且问题仍然存在,询问他们使用的是哪个路由器型号 现在,您将指导他们如何重启设备: 如果型号是 MTD-327J,请告诉他们按下红色按钮并保持 5 秒钟,然后等待 5 分钟再测试连接。

-- If the model number is MTD-327S, advise them to unplug and replug it, then wait 5 minutes before testing the connection. - If the customer's issue persists after restarting the device and waiting 5 minutes, connect them to IT support by outputting {"IT support requested"}.
如果型号是 MTD-327S,建议他们拔掉插头后再重新插入,然后等待 5 分钟再测试连接。 - 如果客户的问题在重启设备并等待 5 分钟后仍然存在,请通过输出 {"IT support requested"} 将他们连接到 IT 支持。

- If the user starts asking questions that are unrelated to this topic then confirm if they would like to end the current chat about troubleshooting and classify their request according to the following scheme: <insert primary/secondary classification scheme from above here>
- 如果用户开始询问与此主题无关的问题,请确认他们是否希望结束当前的故障排除聊天,并根据以下方案对他们的请求进行分类: < 在这里插入上述主要/次要分类方案>
USER 用户
I need to get my internet working again.
我需要让我的互联网再次正常工作。

Notice that the model has been instructed to emit special strings to indicate when the state of the conversation changes. This enables us to turn our system into a state machine where the state determines which instructions are injected.
请注意,模型已被指示发出特殊字符串以指示对话状态何时发生变化。这使我们能够将系统转变为状态机,其中状态决定注入哪些指令。

By keeping track of state, what instructions are relevant at that state, and also optionally what state transitions are allowed from that state, we can put guardrails around the user experience that would be hard to achieve with a less structured approach.
通过跟踪状态、在该状态下哪些指令是相关的,以及可选地该状态下允许哪些状态转换,我们可以为用户体验设置保护措施,这些措施在结构化较差的方法中很难实现。

Since models have a fixed context length, dialogue between a user and an assistant in which the entire conversation is included in the context window cannot continue indefinitely.
由于模型具有固定的上下文长度,因此用户与助手之间的对话,如果整个对话都包含在上下文窗口中,则无法无限期地继续。

There are various workarounds to this problem, one of which is to summarize previous turns in the conversation.
解决这个问题有很多变通方法,其中之一是总结对话中的先前回合。

Once the size of the input reaches a predetermined threshold length, this could trigger a query that summarizes part of the conversation and the summary of the prior conversation could be included as part of the system message.
一旦输入的大小达到预定的阈值,这可能会触发一个查询,摘要部分对话的内容,并且之前对话的摘要可以作为系统消息的一部分包含在内。

Alternatively, prior conversation could be summarized asynchronously in the background throughout the entire conversation.
或者,之前的对话可以在整个对话过程中异步地在后台进行总结。

An alternative solution is to dynamically select previous parts of the conversation that are most relevant to the current query. See the tactic "Use embeddings-based search to implement efficient knowledge retrieval".
另一种解决方案是动态地选择与当前查询最相关的对话的先前部分。请参阅策略"使用基于嵌入的搜索来实现高效的知识检索"

Since models have a fixed context length, they cannot be used to summarize a text longer than the context length minus the length of the generated summary in a single query.
由于模型有固定的上下文长度,它们不能在单次查询中总结比上下文长度减去生成的摘要长度更长的文本。

To summarize a very long document such as a book we can use a sequence of queries to summarize each section of the document. Section summaries can be concatenated and summarized producing summaries of summaries.
为了总结像书籍这样很长的文档,我们可以使用一系列查询来总结文档的每个部分。各部分的总结可以连接并进一步总结,从而生成总结的总结。

This process can proceed recursively until an entire document is summarized.
这个过程可以递归进行,直到整个文档被总结。

If it’s necessary to use information about earlier sections in order to make sense of later sections, then a further trick that can be useful is to include a running summary of the text that precedes any given point in the book while summarizing content at that point.
如果需要使用早期部分的信息以理解后续部分,那么一个有用的技巧是,在总结书中某一点的内容时,包含该点之前文本的持续摘要。

The effectiveness of this procedure for summarizing books has been studied in previous research by OpenAI using variants of GPT-3.
这种用于总结书籍的程序的有效性已经在 OpenAI 使用 GPT-3 的变体进行的先前 研究 中得到研究。

Sometimes we get better results when we explicitly instruct the model to reason from first principles before coming to a conclusion. Suppose for example we want a model to evaluate a student’s solution to a math problem.
有时,当我们明确指示模型在得出结论之前从基本原理进行推理时,我们会获得更好的结果。例如,假设我们希望模型评估学生对一个数学问题的解答。

The most obvious way to approach this is to simply ask the model if the student's solution is correct or not.
最明显的方法是直接询问模型学生解决方案是否正确。

SYSTEM 系统
Determine if the student's solution is correct or not.
确定学生的解题方法是否正确。
USER 用户
Problem Statement: I'm building a solar power installation and I need help working out the financials.
问题陈述:我正在建造一个太阳能发电设施,需要帮助计算财务问题。

- Land costs $100 / square foot - I can buy solar panels for $250 / square foot - I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square foot What is the total cost for the first year of operations as a function of the number of square feet.
土地价格为每平方英尺 100 美元 我可以以每平方英尺 250 美元的价格购买太阳能电池板 我谈判了一份维护合同,每年费用固定为 10 万美元,另外每平方英尺加收 10 美元 第一个运营年度的总费用与平方英尺数的关系是什么?

Student's Solution: Let x be the size of the installation in square feet. 1. Land cost: 100x 2. Solar panel cost: 250x 3. Maintenance cost: 100,000 + 100x Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
学生解答:设安装面积为 x 平方英尺。 土地成本:100 倍 太阳能面板成本:250 倍 维护成本:100,000 + 100x 总成本:100x + 250x + 100,000 + 100x = 450x + 100,000
ASSISTANT 助手
The student's solution is correct.
学生的解题方法是正确的。

But the student's solution is actually not correct! We can get the model to successfully notice this by prompting the model to generate its own solution first.
但是学生的解答实际上是不正确的!我们可以通过让模型先生成自己的解答来使模型成功发现这一点。

SYSTEM 系统
First work out your own solution to the problem. Then compare your solution to the student's solution and evaluate if the student's solution is correct or not. Don't decide if the student's solution is correct until you have done the problem yourself.
首先自己解决这个问题。然后将你的解决方案与学生的解决方案进行比较,评估学生的解决方案是否正确。在自己解决问题之前,不要决定学生的解决方案是否正确。
USER 用户
Problem Statement: I'm building a solar power installation and I need help working out the financials.
问题陈述:我正在建造一个太阳能发电设施,需要帮助计算财务问题。

- Land costs $100 / square foot - I can buy solar panels for $250 / square foot - I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square foot What is the total cost for the first year of operations as a function of the number of square feet.
土地价格为每平方英尺 100 美元 - 我可以以每平方英尺 250 美元的价格购买太阳能面板 我谈判了一份维护合同,费用为每年 100,000 美元,加上每平方英尺 10 美元的额外费用 第一年的总成本是以平方英尺数为函数的。

Student's Solution: Let x be the size of the installation in square feet. 1. Land cost: 100x 2. Solar panel cost: 250x 3. Maintenance cost: 100,000 + 100x Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
学生的解答:设 x 为安装的面积(平方英尺)。 土地成本:100 倍 2. 太阳能电池板成本:250x 维护成本:100,000 + 100x 总成本:100x + 250x + 100,000 + 100x = 450x + 100,000
ASSISTANT 助手
Let x be the size of the installation in square feet. 1. Land cost: 100x 2. Solar panel cost: 250x 3. Maintenance cost: 100,000 + 10x Total cost: 100x + 250x + 100,000 + 10x = 360x + 100,000 The student's solution is incorrect.
设 x 为安装面积(平方英尺)。 土地成本:100x 2. 太阳能电池板成本:250x 3. 维修成本:100,000 + 10x 总成本:100x + 250x + 100,000 + 10x = 360x + 100,000 学生的解答不正确。

They made an error in the maintenance cost calculation, using 100x instead of 10x. The correct total cost for the first year of operations is 360x + 100,000.
他们在维护成本计算中犯了一个错误,使用了 100 倍而不是 10 倍。第一年运营的正确总成本是 360 倍加 100,000。

The previous tactic demonstrates that it is sometimes important for the model to reason in detail about a problem before answering a specific question.
之前的策略表明,有时模型在回答特定问题之前,详细推理问题是很重要的。

For some applications, the reasoning process that a model uses to arrive at a final answer would be inappropriate to share with the user.
对于某些应用,模型用来得出最终答案的推理过程不适合与用户共享。

For example, in tutoring applications we may want to encourage students to work out their own answers, but a model’s reasoning process about the student’s solution could reveal the answer to the student.
例如,在辅导应用中,我们可能希望鼓励学生自己得出答案,但模型对学生解决方案的推理过程可能会向学生透露答案。

Inner monologue is a tactic that can be used to mitigate this. The idea of inner monologue is to instruct the model to put parts of the output that are meant to be hidden from the user into a structured format that makes parsing them easy.
内心独白是一种可以用来缓解这种情况的策略。内心独白的理念是指示模型将那些应该隐藏在用户视线之外的输出部分放到一个结构化的格式中,以便于解析。

Then before presenting the output to the user, the output is parsed and only part of the output is made visible.
然后,在将输出呈现给用户之前,会对输出进行解析,并且只显示输出的一部分。

SYSTEM 系统
Follow these steps to answer the user queries. Step 1 - First work out your own solution to the problem. Don't rely on the student's solution since it may be incorrect. Enclose all your work for this step within triple quotes (""").
按照以下步骤回答用户查询。 步骤 1 - 首先解决您自己对问题的看法。不要依赖学生的解决方案,因为它可能是错误的。将您在此步骤中的所有工作用三重引号("""")括起来。

Step 2 - Compare your solution to the student's solution and evaluate if the student's solution is correct or not. Enclose all your work for this step within triple quotes (""").
步骤二 - 将你的解决方案与学生的解决方案进行比较,并评估学生的解决方案是否正确。将此步骤中的所有工作用三个引号(""")括起来。

Step 3 - If the student made a mistake, determine what hint you could give the student without giving away the answer. Enclose all your work for this step within triple quotes (""").
步骤 3 - 如果学生犯了错误,确定你可以给学生什么提示而不透露答案。将此步骤的所有工作放在三重引号("""")中。

Step 4 - If the student made a mistake, provide the hint from the previous step to the student (outside of triple quotes). Instead of writing "Step 4 - ..." write "Hint:".
提示:如果学生犯了错误,请向学生提供上一步的提示(在三重引号外)。
USER 用户
Problem Statement: <insert problem statement> Student Solution: <insert student solution>
问题陈述: 学生解答:< 插入学生解答>

Alternatively, this can be achieved with a sequence of queries in which all except the last have their output hidden from the end user.
另外,这可以通过一系列查询来实现,其中除最后一个查询外,所有查询的输出都对最终用户隐藏。

First, we can ask the model to solve the problem on its own. Since this initial query doesn't require the student’s solution, it can be omitted.
首先,我们可以要求模型自行解决问题。由于此初始查询不需要学生的解决方案,因此可以省略。

This provides the additional advantage that there is no chance that the model’s solution will be biased by the student’s attempted solution.
这提供了额外的好处,即模型的解决方案不会受到学生尝试解决方案的偏见。

USER 用户
<insert problem statement>
Please provide the problem statement you want me to translate. I need the actual text to translate it into Simplified Chinese

Next, we can have the model use all available information to assess the correctness of the student’s solution.
接下来,我们可以让模型利用所有可用的信息来评估学生解答的正确性。

SYSTEM 系统
Compare your solution to the student's solution and evaluate if the student's solution is correct or not.
将您的解决方案与学生的解决方案进行比较,并评估学生的解决方案是否正确。
USER 用户
Problem statement: """<insert problem statement>""" Your solution: """<insert model generated solution>""" Student’s solution: """<insert student's solution>"""
问题陈述: """< 插入问题陈述>""" 你的解决方案:"""< 插入模型生成的解决方案>""" 学生的解决方案:“”“< 插入学生的解决方案>”””

Finally, we can let the model use its own analysis to construct a reply in the persona of a helpful tutor.
最后,我们可以让模型利用自己的分析来以一个乐于助人的导师的身份构建回复。

SYSTEM 系统
You are a math tutor. If the student made an error, offer a hint to the student in a way that does not reveal the answer. If the student did not make an error, simply offer them an encouraging comment.
你是数学导师。如果学生犯了错误,给学生提供一个提示,但不要揭示答案。如果学生没有犯错误,简单地给予他们鼓励的评价。
USER 用户
Problem statement: """<insert problem statement>""" Your solution: """<insert model generated solution>""" Student’s solution: """<insert student's solution>""" Analysis: """<insert model generated analysis from previous step>"""
问题陈述: """< 插入问题陈述>""" 您的解决方案:"""< 插入模型生成的解决方案>""" 学生的解答: """<insert student's solution>""" 分析:"""< 插入模型从上一步生成的分析>"""

Suppose that we are using a model to list excerpts from a source which are relevant to a particular question. After listing each excerpt the model needs to determine if it should start writing another or if it should stop.
假设我们正在使用一个模型来列出与特定问题相关的源文摘。在列出每个摘录后,模型需要确定是否应该开始写另一个摘录,还是应该停止。

If the source document is large, it is common for a model to stop too early and fail to list all relevant excerpts. In that case, better performance can often be obtained by prompting the model with followup queries to find any excerpts it missed on previous passes.
如果源文档很大,模型通常会过早停止,无法列出所有相关摘录。在这种情况下,通过使用后续查询来提示模型,可以获得更好的性能,以找出模型在之前的处理过程中遗漏的摘录。

SYSTEM 系统
You will be provided with a document delimited by triple quotes.
您将获得一个由三个引号分隔的文档。

Your task is to select excerpts which pertain to the following question: "What significant paradigm shifts have occurred in the history of artificial intelligence." Ensure that excerpts contain all relevant context needed to interpret them - in other words don't extract small snippets that are missing important context.
你的任务是选择与以下问题相关的摘录:“人工智能历史上发生了哪些重大的范式转变”。 确保摘录包含解释它们所需的所有相关上下文,换句话说,不要提取缺少重要上下文的短片段。

Provide output in JSON format as follows: [{"excerpt": "..."}, ... {"excerpt": "..."}]
{ "Translated Text": "提供以下 JSON 格式的输出:" } 请提供需要翻译的文本。 ... {"excerpt": "..."}
USER 用户
"""<insert document here>"""
< 在此插入文档>
ASSISTANT 助手
[{"excerpt": "the model writes an excerpt here"}, ... {"excerpt": "the model writes another excerpt here"}]
[{"excerpt": "模型在这里写一段文字"}] ... {"excerpt": "模型在这里写了另一个摘录"}
USER 用户
Are there more relevant excerpts? Take care not to repeat excerpts. Also ensure that excerpts contain all relevant context needed to interpret them - in other words don't extract small snippets that are missing important context.
是否有更多相关的摘录?请注意不要重复摘录。同时,确保摘录包含解释所需的所有相关背景——换句话说,不要提取缺乏重要背景的小片段。

A model can leverage external sources of information if provided as part of its input. This can help the model to generate more informed and up-to-date responses.
模型可以利用作为输入的一部分提供的外部信息源。这可以帮助模型生成更有见地和更及时的回应。

For example, if a user asks a question about a specific movie, it may be useful to add high quality information about the movie (e.g. actors, director, etc…) to the model’s input.
例如,如果用户问有关特定电影的问题,在模型输入中添加有关电影的高质量信息(例如演员、导演等)可能会有用。

Embeddings can be used to implement efficient knowledge retrieval, so that relevant information can be added to the model input dynamically at run-time.
嵌入可以用来实现高效的知识检索,以便在运行时动态地将相关信息添加到模型输入中。

A text embedding is a vector that can measure the relatedness between text strings. Similar or relevant strings will be closer together than unrelated strings.
文本嵌入是一个向量,可以测量文本字符串之间的相关性。相似或相关的字符串会比不相关的字符串更接近。

This fact, along with the existence of fast vector search algorithms means that embeddings can be used to implement efficient knowledge retrieval. In particular, a text corpus can be split up into chunks, and each chunk can be embedded and stored.
这一事实,加上快速向量搜索算法的存在,意味着嵌入可以用来实现高效的知识检索。特别是,一个文本语料库可以被分成多个片段,每个片段可以被嵌入并存储。

Then a given query can be embedded and vector search can be performed to find the embedded chunks of text from the corpus that are most related to the query (i.e. closest together in the embedding space).
然后,可以将给定的查询嵌入,并执行向量搜索,以找到与查询最相关的嵌入文本块(即在嵌入空间中最接近的)。

Example implementations can be found in the OpenAI Cookbook. See the tactic “Instruct the model to use retrieved knowledge to answer queries” for an example of how to use knowledge retrieval to minimize the likelihood that a model will make up incorrect facts.
示例实现可以在OpenAI 食谱中找到。请参阅战术“指示模型使用检索到的知识来回答查询”,了解如何使用知识检索来最大程度地减少模型编造不正确事实的可能性。

Language models cannot be relied upon to perform arithmetic or long calculations accurately on their own. In cases where this is needed, a model can be instructed to write and run code instead of making its own calculations.
语言模型本身无法准确地进行算术运算或长时间计算。如果需要进行此类操作,可以指示模型编写并运行代码,而不是进行自己的计算。

In particular, a model can be instructed to put code that is meant to be run into a designated format such as triple backtick. After an output is produced, the code can be extracted and run.
特别是,可以指示模型将预定要运行的代码放入指定的格式,如三重反引号。生成输出后,可以提取并运行代码。

Finally, if necessary, the output from the code execution engine (i.e. Python interpreter) can be provided as an input to the model for the next query.
最后,如有必要,代码执行引擎(即 Python 解释器)的输出可以作为下一次查询的输入提供给模型。

SYSTEM 系统
You can write and execute Python code by enclosing it in triple backticks, e.g. ```code goes here```. Use this to perform calculations.
您可以通过将 Python 代码括在三个反引号中来编写和执行,例如 ```代码在这里```。使用此方法进行计算。
USER 用户
Find all real-valued roots of the following polynomial: 3*x**5 - 5*x**4 - 3*x**3 - 7*x - 10.
找到以下多项式的所有实数根:3*x**5 - 5*x**4 - 3*x**3 - 7*x - 10。

Another good use case for code execution is calling external APIs. If a model is instructed in the proper use of an API, it can write code that makes use of it.
另一个代码执行的良好用例是调用外部 API。如果一个模型被指导如何正确使用 API,它可以编写利用 API 的代码。

A model can be instructed in how to use an API by providing it with documentation and/or code samples showing how to use the API.
可以通过提供文档和/或代码示例来指导模型如何使用 API,这些文档和代码示例展示了如何使用该 API。

SYSTEM 系统
You can write and execute Python code by enclosing it in triple backticks. Also note that you have access to the following module to help users send messages to their friends: ```python import message message.write(to="John", message="Hey, want to meetup after work?")```
你可以通过将 Python 代码用三重反引号括起来来编写和执行代码。还要注意,你可以使用以下模块来帮助用户向他们的朋友发送消息: ```python ``` 导入 message message.write(to="John", message="嘿,下班后想聚聚吗?")

WARNING: Executing code produced by a model is not inherently safe and precautions should be taken in any application that seeks to do this. In particular, a sandboxed code execution environment is needed to limit the harm that untrusted code could cause.
警告:执行模型生成的代码本身并不安全,在任何寻求执行此操作的应用程序中都应采取预防措施。特别是,需要一个沙盒代码执行环境来限制不受信任的代码可能造成的损害。

The Chat Completions API allows passing a list of function descriptions in requests. This enables models to generate function arguments according to the provided schemas.
Chat Completions API 允许在请求中传递函数描述列表。这使模型能够根据提供的模式生成函数参数。

Generated function arguments are returned by the API in JSON format and can be used to execute function calls. Output provided by function calls can then be fed back into a model in the following request to close the loop.
生成的函数参数通过 API 以 JSON 格式返回,可以用于执行函数调用。函数调用提供的输出可以在下一个请求中反馈回模型,以完成闭环。

This is the recommended way of using OpenAI models to call external functions. To learn more see the function calling section in our introductory text generation guide and more function calling examples in the OpenAI Cookbook.
这是使用 OpenAI 模型调用外部函数的推荐方法。要了解更多信息,请参阅我们介绍性文本生成指南中的 函数调用部分 以及 OpenAI 食谱中的更多 函数调用示例

Sometimes it can be hard to tell whether a change — e.g., a new instruction or a new design — makes your system better or worse.
有时候很难判断一个变化——例如,一个新指令或新设计——是使你的系统变得更好还是更糟。

Looking at a few examples may hint at which is better, but with small sample sizes it can be hard to distinguish between a true improvement or random luck. Maybe the change helps performance on some inputs, but hurts performance on others.
查看几个示例可能会暗示哪种更好,但由于样本量小,很难区分真正的改善和随机运气。也许这种变化对某些输入的性能有帮助,但对其他输入的性能却有害。

Evaluation procedures (or "evals") are useful for optimizing system designs. Good evals are:
评估程序(或“评估”)对优化系统设计非常有用。好的评估包括:

  • Representative of real-world usage (or at least diverse)
    真实世界使用的代表(或至少是多样化的)
  • Contain many test cases for greater statistical power (see table below for guidelines)
    包含多个测试案例以获得更大的统计能力(请参见下面的表格以获取指导)
  • Easy to automate or repeat
    易于自动化或重复
Difference to detect 检测差异Sample size needed for 95% confidence
需要 95%置信度的样本量
30%~10
10%~100
3%~1,000
1%~10,000

Evaluation of outputs can be done by computers, humans, or a mix. Computers can automate evals with objective criteria (e.g., questions with single correct answers) as well as some subjective or fuzzy criteria, in which model outputs are evaluated by other model queries. OpenAI Evals is an open-source software framework that provides tools for creating automated evals.
输出的评估可以由计算机、人类或两者混合进行。计算机可以使用客观标准(例如,具有唯一正确答案的问题)以及一些主观或模糊标准来自动化评估,其中模型输出通过其他模型查询进行评估。OpenAI Evals 是一个开源软件框架,提供用于创建自动化评估的工具。

Model-based evals can be useful when there exists a range of possible outputs that would be considered equally high in quality (e.g. for questions with long answers).
基于模型的评估在存在一系列可能的输出被认为同样高质量的情况下(例如,对于有长答案的问题)可以很有用。

The boundary between what can be realistically evaluated with a model-based eval and what requires a human to evaluate is fuzzy and is constantly shifting as models become more capable.
可以通过基于模型的评估进行现实评估的界限与需要人类进行评估的界限是模糊的,并且随着模型能力的不断提升,这一界限也在不断变化。

We encourage experimentation to figure out how well model-based evals can work for your use case.
我们鼓励实验,以确定基于模型的评估在您的用例中能发挥多大的作用。

Suppose it is known that the correct answer to a question should make reference to a specific set of known facts. Then we can use a model query to count how many of the required facts are included in the answer.
假设已知问题的正确答案应该参考一组特定的已知事实。那么我们可以使用模型查询来计算答案中包含多少个所需的事实。

For example, using the following system message:
例如,使用以下系统消息:

SYSTEM 系统
You will be provided with text delimited by triple quotes that is supposed to be the answer to a question. Check if the following pieces of information are directly contained in the answer: - Neil Armstrong was the first person to walk on the moon.
您将获得以三个引号分隔的文本,该文本应该是某个问题的答案。请检查以下信息是否直接包含在答案中: - 尼尔·阿姆斯特朗是第一个登上月球的人。

- The date Neil Armstrong first walked on the moon was July 21, 1969. For each of these points perform the following steps: 1 - Restate the point. 2 - Provide a citation from the answer which is closest to this point.
尼尔·阿姆斯特朗首次登月的日期是 1969 年 7 月 21 日。 对于这些点中的每一个,执行以下步骤: 1. 重申要点。 2 - 提供一个与此观点最接近的答案引用。

3 - Consider if someone reading the citation who doesn't know the topic could directly infer the point. Explain why or why not before making up your mind. 4 - Write "yes" if the answer to 3 was yes, otherwise write "no".
3 - 考虑一下,如果一个不了解主题的人阅读了引文,是否可以直接推断出要点。在做出决定之前,解释一下为什么或为什么不。 4 - 如果第 3 题的答案是“是”,请写“是”;否则写“否”。

Finally, provide a count of how many "yes" answers there are. Provide this count as {"count": <insert count here>}.
最后,提供有多少个“是”的答案。将此计数显示为 {"count": < 插入计数在这里>}。

Here's an example input where both points are satisfied:
这是一个满足两个条件的输入示例:

SYSTEM 系统
<insert system message above>
<插入系统消息在上面>
USER 用户
"""Neil Armstrong is famous for being the first human to set foot on the Moon. This historic event took place on July 21, 1969, during the Apollo 11 mission."""
尼尔·阿姆斯特朗因成为第一个踏上月球的人而闻名。这个历史性事件发生在 1969 年 7 月 21 日,期间进行的是阿波罗 11 号任务。

Here's an example input where only one point is satisfied:
这是一个只有一个点满足的输入示例:

SYSTEM 系统
<insert system message above>
<在上面插入系统消息>
USER 用户
"""Neil Armstrong made history when he stepped off the lunar module, becoming the first person to walk on the moon."""
尼尔·阿姆斯特朗从登月舱走出,成为第一个踏上月球的人,创造了历史。

Here's an example input where none are satisfied:
这里是一个没有满足任何条件的示例输入:

SYSTEM 系统
<insert system message above>
请提供需要翻译的源文本
USER 用户
"""In the summer of '69, a voyage grand, Apollo 11, bold as legend's hand. Armstrong took a step, history unfurled, "One small step," he said, for a new world."""
在 69 年的夏天,盛大的航行, 阿波罗 11 号,如同传奇之手般大胆。 阿姆斯特朗迈出一步,历史展开 “一小步,”他说,为了一个新世界。”

There are many possible variants on this type of model-based eval.
这种基于模型的评估有很多可能的变体。

Consider the following variation which tracks the kind of overlap between the candidate answer and the gold-standard answer, and also tracks whether the candidate answer contradicts any part of the gold-standard answer.
请考虑以下变体,它跟踪候选答案和金标准答案之间重叠的类型,并跟踪候选答案是否与金标准答案的任何部分相矛盾。

SYSTEM 系统
Use the following steps to respond to user inputs. Fully restate each step before proceeding.
使用以下步骤来回应用户输入。在继续之前,完全重述每个步骤。

i.e. "Step 1: Reason...". Step 1: Reason step-by-step about whether the information in the submitted answer compared to the expert answer is either: disjoint, equal, a subset, a superset, or overlapping (i.e. some intersection but not subset/superset).
即“第 1 步:理由……” 步骤 1:逐步推理提交的答案与专家答案中的信息是否为:不相交、相等、子集、超集或重叠(即有一些交集但不是子集/超集)。

Step 2: Reason step-by-step about whether the submitted answer contradicts any aspect of the expert answer. Step 3: Output a JSON object structured like: {"type_of_overlap": "disjoint" or "equal" or "subset" or "superset" or "overlapping", "contradiction": true or false}
步骤 2:逐步推理提交的答案是否与专家答案的任何方面相矛盾。 步骤 3:输出一个 JSON 对象,结构如下:{"type_of_overlap": "不相交" 或 "相等" 或 "子集" 或 "超集" 或 "重叠", "contradiction": true 或 false}

Here's an example input with a substandard answer which nonetheless does not contradict the expert answer:
这是一个包含不合格答案的示例输入,但它并不与专家的答案相矛盾:

SYSTEM 系统
<insert system message above>
< 插入系统消息在上方>
USER 用户
Question: """What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time.""" Submitted Answer: """Didn't he walk on the moon or something?""" Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon.
问题:“尼尔·阿姆斯特朗最著名的事件是什么?它发生在什么日期?假设为协调世界时。” 提交的答案:"""他不是在月球上走过吗?""" 专家回答:“尼尔·阿姆斯特朗因成为第一个在月球上行走的人而最为出名。”

This historic event occurred on July 21, 1969."""
这一历史事件发生在 1969 年 7 月 21 日。

Here's an example input with answer that directly contradicts the expert answer:
这是一个示例输入,其答案与专家答案直接矛盾:

SYSTEM 系统
<insert system message above>
<在上面插入系统消息>
USER 用户
Question: """What event is Neil Armstrong most famous for and on what date did it occur?
问题:“尼尔·阿姆斯特朗最出名的是什么事件,它发生在什么日期?

Assume UTC time.""" Submitted Answer: """On the 21st of July 1969, Neil Armstrong became the second person to walk on the moon, following after Buzz Aldrin.""" Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon.
假设为协调世界时(UTC)。 1969 年 7 月 21 日,尼尔·阿姆斯特朗成为第二位踏上月球的人,紧随其后的是巴兹·奥尔德林。 专家回答:"""尼尔·阿姆斯特朗最著名的就是成为第一个踏上月球的人。

This historic event occurred on July 21, 1969."""
这个历史事件发生在 1969 年 7 月 21 日。

Here's an example input with a correct answer that also provides a bit more detail than is necessary:
这是一个带有正确答案的示例输入,它还提供了比必要更多的细节:

SYSTEM 系统
<insert system message above>
<插入系统消息>
USER 用户
Question: """What event is Neil Armstrong most famous for and on what date did it occur?
问题:"""尼尔·阿姆斯特朗最出名的事件是什么,发生在什么日期?

Assume UTC time.""" Submitted Answer: """At approximately 02:56 UTC on July 21st 1969, Neil Armstrong became the first human to set foot on the lunar surface, marking a monumental achievement in human history.""" Expert Answer: """Neil Armstrong is most famous for being the first person to walk on the moon.
假设使用 UTC 时间。 提交的答案:“1969 年 7 月 21 日 02:56 UTC 左右,尼尔·阿姆斯特朗成为第一个踏上月球表面的人,标志着人类历史上的一个重大成就。” 专家回答:"""尼尔·阿姆斯特朗最著名的成就是成为第一个在月球上行走的人。

This historic event occurred on July 21, 1969."""
这个历史事件发生在 1969 年 7 月 21 日。

For more inspiration, visit the OpenAI Cookbook, which contains example code and also links to third-party resources such as:
更多灵感,请访问OpenAI 食谱,其中包含示例代码以及指向第三方资源的链接,例如: