July 18, 2024 2024 年 7 月 18 日

GPT-4o mini: advancing cost-efficient intelligence
GPT-4o mini：推進高效經濟的智能發展

Introducing our most cost-efficient small model
隆重推出我們最具成本效益的小型模型

Introducing GPT-4o mini > Hero > Media Item

OpenAI is committed to making intelligence as broadly accessible as possible. Today, we're announcing GPT-4o mini, our most cost-efficient small model. We expect GPT-4o mini will significantly expand the range of applications built with AI by making intelligence much more affordable. GPT-4o mini scores 82% on MMLU and currently outperforms GPT-4¹ on chat preferences in LMSYS leaderboard(opens in a new window). It is priced at 15 cents per million input tokens and 60 cents per million output tokens, an order of magnitude more affordable than previous frontier models and more than 60% cheaper than GPT-3.5 Turbo.
OpenAI 致力於讓智能技術惠及所有人。今天，我們隆重推出 GPT-4o mini，這是我們最具成本效益的小型模型。我們預計 GPT-4o mini 將通過讓智能技術更加經濟實惠，從而顯著擴展基於人工智能構建的應用範圍。GPT-4o mini 在 MMLU 上取得了 82% 的優異成績，並且在 LMSYS 排行榜的聊天偏好方面目前優於 GPT-4 ¹ 。它的定價為每百萬輸入令牌 15 美分，每百萬輸出令牌 60 美分，比之前的先進模型價格低了一個數量級，比 GPT-3.5 Turbo 便宜 60% 以上。

GPT-4o mini enables a broad range of tasks with its low cost and latency, such as applications that chain or parallelize multiple model calls (e.g., calling multiple APIs), pass a large volume of context to the model (e.g., full code base or conversation history), or interact with customers through fast, real-time text responses (e.g., customer support chatbots).
GPT-4o mini 以其低成本和低延遲，可實現廣泛的任務，例如鏈接或並行化多個模型調用的應用程序（例如，調用多個 API）、將大量上下文傳遞給模型（例如，完整的代碼庫或對話歷史記錄）或通過快速、實時的文本響應與客戶互動（例如，客戶支持聊天機器人）。

Today, GPT-4o mini supports text and vision in the API, with support for text, image, video and audio inputs and outputs coming in the future. The model has a context window of 128K tokens and knowledge up to October 2023. Thanks to the improved tokenizer shared with GPT-4o, handling non-English text is now even more cost effective.
目前，GPT-4o mini 在 API 中支持文本和圖像，未來將支持文本、圖像、視頻和音頻的輸入和輸出。該模型具有 128K 令牌的上下文窗口和截至 2023 年 10 月的知識。得益於與 GPT-4o 共享的改進版分詞器，處理非英語文本現在更加經濟高效。

A small model with superior textual intelligence and multimodal reasoning
一款兼具卓越文本智能和多模態推理能力的小型模型

GPT-4o mini surpasses GPT-3.5 Turbo and other small models on academic benchmarks across both textual intelligence and multimodal reasoning, and supports the same range of languages as GPT-4o. It also demonstrates strong performance in function calling, which can enable developers to build applications that fetch data or take actions with external systems, and improved long-context performance compared to GPT-3.5 Turbo.
GPT-4o mini 在文本智能和多模態推理的學術基準測試中均超越了 GPT-3.5 Turbo 和其他小型模型，並支持與 GPT-4o 相同的語言範圍。與 GPT-3.5 Turbo 相比，它在函數調用方面也表現出色，這使得開發人員能夠構建可以從外部系統獲取數據或採取行動的應用程序，並改進了長上下文性能。

GPT-4o mini has been evaluated across several key benchmarks².
GPT-4o mini 已經在幾個關鍵基準測試中進行了評估 ² 。

Reasoning tasks: GPT-4o mini is better than other small models at reasoning tasks involving both text and vision, scoring 82.0% on MMLU, a textual intelligence and reasoning benchmark, as compared to 77.9% for Gemini Flash and 73.8% for Claude Haiku.
推理任務：GPT-4o mini 在涉及文本和圖像的推理任務方面優於其他小型模型，在文本智能和推理基準 MMLU 上得分為 82.0%，而 Gemini Flash 為 77.9%，Claude Haiku 為 73.8%。

Math and coding proficiency: GPT-4o mini excels in mathematical reasoning and coding tasks, outperforming previous small models on the market. On MGSM, measuring math reasoning, GPT-4o mini scored 87.0%, compared to 75.5% for Gemini Flash and 71.7% for Claude Haiku. GPT-4o mini scored 87.2% on HumanEval, which measures coding performance, compared to 71.5% for Gemini Flash and 75.9% for Claude Haiku.
GPT-4o mini 在數學推理和編程任務方面表現出色，超越了市面上先前的小型模型。在衡量數學推理能力的 MGSM 測試中，GPT-4o mini 得分為 87.0%，而 Gemini Flash 為 75.5%，Claude Haiku 為 71.7%。在衡量編程能力的 HumanEval 測試中，GPT-4o mini 得分為 87.2%，而 Gemini Flash 為 71.5%，Claude Haiku 為 75.9%。

Multimodal reasoning: GPT-4o mini also shows strong performance on MMMU, a multimodal reasoning eval, scoring 59.4% compared to 56.1% for Gemini Flash and 50.2% for Claude Haiku.
GPT-4o mini 在多模態推理評估 MMMU 中也表現出色，得分為 59.4%，而 Gemini Flash 為 56.1%，Claude Haiku 為 50.2%。

Model Evaluation Scores 模型評估分數

As part of our model development process, we worked with a handful of trusted partners to better understand the use cases and limitations of GPT-4o mini. We partnered with companies like Ramp(opens in a new window) and Superhuman(opens in a new window) who found GPT-4o mini to perform significantly better than GPT-3.5 Turbo for tasks such as extracting structured data from receipt files or generating high quality email responses when provided with thread history.
作為模型開發過程的一部分，我們與幾家值得信賴的合作夥伴合作，以更好地了解 GPT-4o mini 的用例和限制。我們與 Ramp 和 Superhuman 等公司合作，他們發現 GPT-4o mini 在從收據文件中提取結構化數據或在提供線索歷史記錄的情況下生成高質量電子郵件回复等任務方面，表現明顯優於 GPT-3.5 Turbo。

Built-in safety measures 內置安全措施

Safety is built into our models from the beginning, and reinforced at every step of our development process. In pre-training, we filter out(opens in a new window) information that we do not want our models to learn from or output, such as hate speech, adult content, sites that primarily aggregate personal information, and spam. In post-training, we align the model’s behavior to our policies using techniques such as reinforcement learning with human feedback (RLHF) to improve the accuracy and reliability of the models’ responses.
安全性從一開始就融入我們的模型中，並在開發過程的每個階段都得到強化。在預先訓練階段，我們會過濾掉不希望模型學習或輸出的資訊，例如仇恨言論、成人內容、主要收集個人資訊的網站以及垃圾郵件。在訓練後，我們會使用強化學習與人類回饋（RLHF）等技術，根據我們的政策調整模型的行為，以提高模型回應的準確性和可靠性。

GPT-4o mini has the same safety mitigations built-in as GPT-4o, which we carefully assessed using both automated and human evaluations according to our Preparedness Framework and in line with our voluntary commitments. More than 70 external experts in fields like social psychology and misinformation tested GPT-4o to identify potential risks, which we have addressed and plan to share the details of in the forthcoming GPT-4o system card and Preparedness scorecard. Insights from these expert evaluations have helped improve the safety of both GPT-4o and GPT-4o mini.
GPT-4o mini 擁有與 GPT-4o 相同的內建安全防護措施，我們根據「整備框架」並依照我們的自願承諾，使用自動和人工評估仔細評估了這些措施。超過 70 位來自社會心理學和錯誤資訊等領域的外部專家測試了 GPT-4o，以識別潛在風險，我們已解決這些風險，並計畫在即將發布的 GPT-4o 系統卡和整備記分卡中分享詳細資訊。這些專家評估的見解有助於提高 GPT-4o 和 GPT-4o mini 的安全性。

Building on these learnings, our teams also worked to improve the safety of GPT-4o mini using new techniques informed by our research. GPT-4o mini in the API is the first model to apply our instruction hierarchy(opens in a new window) method, which helps to improve the model’s ability to resist jailbreaks, prompt injections, and system prompt extractions. This makes the model’s responses more reliable and helps make it safer to use in applications at scale.
基於這些經驗，我們的團隊還利用我們研究中獲得的新技術，努力提高 GPT-4o mini 的安全性。API 中的 GPT-4o mini 是第一個應用我們指令層級方法的模型，該方法有助於提高模型抵抗破解、提示注入和系統提示提取的能力。這使得模型的回應更加可靠，並有助於提高其在大規模應用中的安全性。

We’ll continue to monitor how GPT-4o mini is being used and improve the model’s safety as we identify new risks.
我們將持續監控 GPT-4o mini 的使用方式，並在發現新風險時改進模型的安全性。

Availability and pricing 供應情況與價格

GPT-4o mini is now available as a text and vision model in the Assistants API, Chat Completions API, and Batch API. Developers pay 15 cents per 1M input tokens and 60 cents per 1M output tokens (roughly the equivalent of 2500 pages in a standard book). We plan to roll out fine-tuning for GPT-4o mini in the coming days.
GPT-4o mini 現已作為文字和視覺模型在 Assistants API、Chat Completions API 和 Batch API 中提供。開發人員需支付每 100 萬個輸入標記 15 美分，每 100 萬個輸出標記 60 美分（大約相當於標準書籍中的 2500 頁）。我們計畫在未來幾天內推出 GPT-4o mini 的微調功能。

In ChatGPT, Free, Plus and Team users will be able to access GPT-4o mini starting today, in place of GPT-3.5 Turbo. Enterprise users will also have access starting next week, in line with our mission to make the benefits of AI accessible to all.
在 ChatGPT 中，免費版、Plus 版和團隊用戶將從今天開始可以使用 GPT-4o mini，以取代 GPT-3.5 Turbo。企業用戶也將從下週開始可以使用，這符合我們讓所有人都能享受到 AI 優勢的使命。

What’s Next 未來展望

Over the past few years, we’ve witnessed remarkable advancements in AI intelligence paired with substantial reductions in cost. For example, the cost per token of GPT-4o mini has dropped by 99% since text-davinci-003, a less capable model introduced in 2022. We’re committed to continuing this trajectory of driving down costs while enhancing model capabilities.
在過去幾年中，我們見證了 AI 智慧的顯著進步，同時成本也大幅降低。例如，自 2022 年推出功能較弱的 text-davinci-003 模型以來，GPT-4o mini 的每個標記成本已下降了 99%。我們致力於持續降低成本，同時增強模型功能。

We envision a future where models become seamlessly integrated in every app and on every website. GPT-4o mini is paving the way for developers to build and scale powerful AI applications more efficiently and affordably. The future of AI is becoming more accessible, reliable, and embedded in our daily digital experiences, and we’re excited to continue to lead the way.
我們 envision a future where models become seamlessly integrated in every app and on every website. GPT-4o mini 正在為開發人員鋪平道路，讓他們能夠更有效率、更經濟地建構和擴展強大的 AI 應用程式。AI 的未來將變得更容易取得、更可靠，並融入我們日常的數位體驗中，我們很高興能持續引領潮流。

Author 作者

OpenAI

Acknowledgments 致謝

Leads: Jacob Menick, Kevin Lu, Shengjia Zhao, Eric Wallace, Hongyu Ren, Haitang Hu, Nick Stathas, Felipe Petroski Such
領導者：Jacob Menick、Kevin Lu、盛佳趙、Eric Wallace、任宏宇、胡海堂、Nick Stathas、Felipe Petroski Such

Program Lead: Mianna Chen
計劃負責人：Mianna Chen

Contributions noted in https://openai.com/gpt-4o-contributions/
貢獻者請見 https://openai.com/gpt-4o-contributions/

Footnotes 腳註

1
As of July 18th, 2024, an earlier version of GPT-4o mini outperforms GPT-4T 01-25.
截至 2024 年 7 月 18 日，較早版本的 GPT-4o mini 的表現優於 GPT-4T 01-25。
2
Eval numbers for GPT-4o mini are computed using our simple-evals(opens in a new window) repo with the API assistant system message prompt. For competitor models, we take the maximum number over their reported number (if available), the HELM(opens in a new window) leaderboard, and our own reproduction via simple-evals.
GPT-4o mini 的評估數據是使用我們的 simple-evals 儲存庫和 API 助理系統訊息提示計算得出的。對於競爭對手的模型，我們採用他們報告的數字（如果有的話）、HELM 排行榜和我們自己透過 simple-evals 進行的複製中的最大值。

GPT-4o mini: advancing cost-efficient intelligenceGPT-4o mini：推進高效經濟的智能發展

A small model with superior textual intelligence and multimodal reasoning一款兼具卓越文本智能和多模態推理能力的小型模型