OpenAI debuts GPT-4o ‘omni’ model now powering ChatGPT
OpenAI 首次推出 GPT-4o“omni”模型,现在为 ChatGPT 提供动力

OpenAI announced a new flagship generative AI model on Monday that they call GPT-4o — the “o” stands for “omni,” referring to the model’s ability to handle text, speech, and video. GPT-4o is set to roll out “iteratively” across the company’s developer and consumer-facing products over the next few weeks.
OpenAI 周一宣布了一款新的旗舰生成式 AI 模型,他们称之为 GPT-4o——“o”代表“omni”,指的是该模型处理文本、语音和视频的能力。GPT-4o 将在未来几周内在该公司的开发人员和面向消费者的产品中“迭代”推出。

OpenAI CTO Mira Murati said that GPT-4o provides “GPT-4-level” intelligence but improves on GPT-4’s capabilities across multiple modalities and media.
OpenAI 首席技术官 Mira Murati 表示,GPT-4o 提供“GPT-4 级别”的智能,但改进了 GPT-4 在多种模式和媒体上的能力。

“GPT-4o reasons across voice, text and vision,” Murati said during a streamed presentation at OpenAI’s offices in San Francisco on Monday. “And this is incredibly important, because we’re looking at the future of interaction between ourselves and machines.”
“GPT-4o 在语音、文本和视觉方面都有原因,”穆拉蒂周一在 OpenAI 旧金山办公室的流媒体演示中说。“这非常重要,因为我们正在研究我们自己与机器之间交互的未来。

GPT-4 Turbo, OpenAI’s previous “leading “most advanced” model, was trained on a combination of images and text and could analyze images and text to accomplish tasks like extracting text from images or even describing the content of those images. But GPT-4o adds speech to the mix.
GPT-4 Turbo 是 OpenAI 之前“领先的”最先进“模型,它经过图像和文本组合的训练,可以分析图像和文本以完成从图像中提取文本甚至描述这些图像的内容等任务。但 GPT-4o 将语音添加到组合中。

What does this enable? A variety of things. 

GPT-4o greatly improves the experience in OpenAI’s AI-powered chatbot, ChatGPT. The platform has long offered a voice mode that transcribes the chatbot’s responses using a text-to-speech model, but GPT-4o supercharges this, allowing users to interact with ChatGPT more like an assistant. 
GPT-4o 极大地改善了 OpenAI 的人工智能聊天机器人 ChatGPT 的体验。该平台长期以来一直提供一种语音模式,使用文本转语音模型转录聊天机器人的响应,但 GPT-4o 增强了这一点,允许用户更像助手一样与 ChatGPT 互动。

For example, users can ask the GPT-4o-powered ChatGPT a question and interrupt ChatGPT while it’s answering. The model delivers “real-time” responsiveness, OpenAI says, and can even pick up on nuances in a user’s voice, in response generating voices in “a range of different emotive styles” (including singing). 
例如,用户可以向 GPT-4o 驱动的 ChatGPT 提问,并在 ChatGPT 回答时打断它。OpenAI表示,该模型提供了“实时”响应能力,甚至可以捕捉到用户声音中的细微差别,从而生成“一系列不同的情感风格”(包括唱歌)的声音。

GPT-4o also upgrades ChatGPT’s vision capabilities. Given a photo — or a desktop screen — ChatGPT can now quickly answer related questions, from topics ranging from “What’s going on in this software code?” to “What brand of shirt is this person wearing?”
GPT-4o 还升级了 ChatGPT 的视觉能力。有了一张照片或桌面屏幕,ChatGPT 现在可以快速回答相关问题,从“这个软件代码中发生了什么?”到“这个人穿什么牌子的衬衫?

ChatGPT’s desktop app in use in a coding task.
ChatGPT 的桌面应用程序在编码任务中使用。
These features will evolve further in the future, Murati says. While today GPT-4o can look at a picture of a menu in a different language and translate it, in the future, the model could allow ChatGPT to, for instance, “watch” a live sports game and explain the rules to you.
Murati说,这些功能将在未来进一步发展。虽然今天 GPT-4o 可以查看不同语言的菜单图片并进行翻译,但在未来,该模型可以允许 ChatGPT “观看”现场体育比赛并向您解释规则。

“We know that these models are getting more and more complex, but we want the experience of interaction to actually become more natural, easy, and for you not to focus on the UI at all, but just focus on the collaboration with ChatGPT,” Murati said. “For the past couple of years, we’ve been very focused on improving the intelligence of these models … But this is the first time that we are really making a huge step forward when it comes to the ease of use.”

GPT-4o is more multilingual as well, OpenAI claims, with enhanced performance in around 50 languages. And in OpenAI’s API and Microsoft’s Azure OpenAI Service, GPT-4o is twice as fast as, half the price of and has higher rate limits than GPT-4 Turbo, the company says.
OpenAI 声称,GPT-4o 也更加多语言,在大约 50 种语言中具有增强的性能。该公司表示,在OpenAI的API和Microsoft的Azure OpenAI服务中,GPT-4o的速度是GPT-4 Turbo的两倍,价格是GPT-4 Turbo的一半,并且具有更高的速率限制。

At present, voice isn’t a part of the GPT-4o API for all customers. OpenAI, citing the risk of misuse, says that it plans to first launch support for GPT-4o’s new audio capabilities to “a small group of trusted partners” in the coming weeks.
目前,语音并不是所有客户的 GPT-4o API 的一部分。OpenAI 以滥用风险为由表示,它计划在未来几周内首先向“一小群值得信赖的合作伙伴”推出对 GPT-4o 新音频功能的支持。

GPT-4o is available in the free tier of ChatGPT starting today and to subscribers to OpenAI’s premium ChatGPT Plus and Team plans with “5x higher” message limits. (OpenAI notes that ChatGPT will automatically switch to GPT-3.5, an older and less capable model, when users hit the rate limit.) The improved ChatGPT voice experience underpinned by GPT-4o will arrive in alpha for Plus users in the next month or so, alongside enterprise-focused options.
从今天开始,GPT-4o 可在 ChatGPT 的免费套餐中使用,并提供给 OpenAI 的高级 ChatGPT Plus 和 Team 计划的订阅者,消息限制为“5 倍以上”。(OpenAI 指出,当用户达到速率限制时,ChatGPT 将自动切换到 GPT-3.5,这是一种较旧且功能较差的模型。以 GPT-4o 为基础的改进的 ChatGPT 语音体验将在下个月左右为 Plus 用户提供 alpha 版本,同时推出以企业为中心的选项。

In related news, OpenAI announced that it’s releasing a refreshed ChatGPT UI on the web with a new, “more conversational” home screen and message layout, and a desktop version of ChatGPT for macOS that lets users ask questions via a keyboard shortcut or take and discuss screenshots. ChatGPT Plus users will get access to the app first, starting today, and a Windows version will arrive later in the year.
在相关新闻中,OpenAI 宣布将在网络上发布更新的 ChatGPT UI,其中包含新的、“更具对话性”的主屏幕和消息布局,以及适用于 macOS 的 ChatGPT 桌面版本,允许用户通过键盘快捷键提问或截取和讨论屏幕截图。从今天开始,ChatGPT Plus 用户将首先访问该应用程序,Windows 版本将在今年晚些时候推出。

Elsewhere, the GPT Store, OpenAI’s library of and creation tools for third-party chatbots built on its AI models, is now available to users of ChatGPT’s free tier. And free users can take advantage of ChatGPT features that were formerly paywalled, like a memory capability that allows ChatGPT to “remember” preferences for future interactions, upload files and photos, and search the web for answers to timely questions.
在其他地方,OpenAI 基于其 AI 模型构建的第三方聊天机器人库和创建工具 GPT Store 现在可供 ChatGPT 免费套餐的用户使用。免费用户可以利用 ChatGPT 以前付费的功能,例如记忆功能,允许 ChatGPT “记住”未来交互的偏好、上传文件和照片,以及在网络上搜索及时问题的答案。