Experiment with Gemini 2.0 Flash native image generation
試驗Gemini 2.0 Flash 的原生圖像生成功能

MAR 12, 2025 2025 年 3 月 12 日

Kat Kampf 凱特·坎普夫 Product Manager 產品經理 Google AI Studio Google AI 工作室

Nicole Brichtova 妮可·布里赫托娃 Product Manager 產品經理 Google DeepMind 谷歌深度思考

In December we first introduced native image output in Gemini 2.0 Flash to trusted testers. Today, we're making it available for developer experimentation across all regions currently supported by Google AI Studio. You can test this new capability using an experimental version of Gemini 2.0 Flash (gemini-2.0-flash-exp) in Google AI Studio and via the Gemini API.
在十二月，我們首次在Gemini 2.0 Flash 中向受信任的測試者介紹了原生圖像輸出。今天，我們將其開放給目前由 Google AI Studio 支持的所有地區的開發者進行試驗。您可以在 Google AI Studio 中使用Gemini 2.0 Flash 的實驗版本（gemini-2.0-flash-exp），以及通過Gemini API 來測試這項新功能。

Gemini 2.0 Flash combines multimodal input, enhanced reasoning, and natural language understanding to create images.
Gemini 2.0 Flash 結合多模態輸入、增強的推理能力和自然語言理解來生成圖像。

Here are some examples of where 2.0 Flash’s multimodal outputs shine:
以下是一些 2.0 Flash 多模態輸出表現優異的例子：

1. Text and images together
1. 文本與圖像結合

Use Gemini 2.0 Flash to tell a story and it will illustrate it with pictures, keeping the characters and settings consistent throughout. Give it feedback and the model will retell the story or change the style of its drawings.
使用Gemini 2.0 Flash 來講故事，它會用圖片來插圖，並保持角色和場景的一致性。提供反饋後，模型可以重新講述故事或改變繪圖風格。

Story and illustration generation in Google AI Studio
在 Google AI Studio 中生成故事和插圖

2. Conversational image editing
2. 對話式圖像編輯

Gemini 2.0 Flash helps you edit images through many turns of a natural language dialogue, great for iterating towards a perfect image, or to explore different ideas together.
Gemini 2.0 Flash 幫助您通過多次自然語言對話來編輯圖像，非常適合逐步完善圖像，或共同探索不同的創意。

Multi-turn conversation image editing maintaining context throughout the conversation in Google AI Studio
在 Google AI Studio 中進行多輪對話的圖像編輯，保持整個對話的上下文

3. World understanding 世界的理解

Unlike many other image generation models, Gemini 2.0 Flash leverages world knowledge and enhanced reasoning to create the right image. This makes it perfect for creating detailed imagery that’s realistic–like illustrating a recipe. While it strives for accuracy, like all language models, its knowledge is broad and general, not absolute or complete.
與許多其他圖像生成模型不同，Gemini 2.0 Flash 利用世界知識和增強推理來生成正確的圖像。這使得它非常適合創建逼真的詳細圖像，例如食譜插圖。雖然它追求準確性，但像所有語言模型一樣，其知識是廣泛和一般的，而不是絕對或完整的。

Interleaved text and image output for a recipe in Google AI Studio
在 Google AI Studio 中交錯顯示的食譜文字和圖片輸出

4. Text rendering 文字呈現

Most image generation models struggle to accurately render long sequences of text, often resulting in poorly formatted or illegible characters, or misspellings. Internal benchmarks show that 2.0 Flash has stronger rendering compared to leading competitive models, and great for creating advertisements, social posts, or even invitations.
大多數圖像生成模型在準確呈現長文本時都很困難，經常導致格式不佳或難以辨認的字符，或拼寫錯誤。內部基準測試顯示，2.0 Flash 的渲染能力比領先的競爭模型更強，非常適合創建廣告、社交帖子甚至邀請函。

Image outputs with long text rendering in Google AI Studio
在 Google AI Studio 中進行圖像輸出和長文本渲染

Start making images with Gemini today
從今天起開始使用Gemini創建圖像

Get started with Gemini 2.0 Flash via the Gemini API. Read more about image generation in our docs.
通過Gemini API 開始使用Gemini 2.0 Flash。閱讀我們的文件以了解更多關於圖像生成的資訊。

from google import genai
from google.genai import types

client = genai.Client(api_key="GEMINI_API_KEY")

response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents=(
        "Generate a story about a cute baby turtle in a 3d digital art style. "
        "For each scene, generate an image."
    ),
    config=types.GenerateContentConfig(
        response_modalities=["Text", "Image"]
    ),
)

Whether you are building AI agents, developing apps with beautiful visuals like illustrated interactive stories, or brainstorming visual ideas in conversation, Gemini 2.0 Flash allows you to add text and image generation with just a single model. We're eager to see what developers create with native image output and your feedback will help us finalize a production-ready version soon.
無論您是在構建 AI 代理、開發具有精美視覺效果的應用程式，如插圖互動故事，還是在對話中集思廣益視覺創意，Gemini 2.0 Flash 讓您只需一個模型即可生成文本和圖像。我們期待看到開發者利用原生圖像輸出創造的作品，您的反饋將幫助我們盡快完成一個可投入生產的版本。

Experiment with Gemini 2.0 Flash native image generation試驗Gemini 2.0 Flash 的原生圖像生成功能

1. Text and images together1. 文本與圖像結合

2. Conversational image editing2. 對話式圖像編輯