GPT Image 2 — Prompting Guide
Unbox Digital — Free Guide

GPT Image 2.
The prompting
guide.

OpenAI's new image model reasons before it generates — planning layout, composition, and constraints. Near-perfect text rendering. Up to 2K resolution. Here's how to get the most out of it.

95%+
Text rendering accuracy
2K
Native resolution
16
Reference images per prompt
OpenAI — Free tier available
Try GPT Image 2 in ChatGPT
chatgpt.com — free tier gets ~6 images/day
Open ChatGPT →
Native thinking mode
Plans composition, object placement, and constraints before rendering. Complex prompts land correctly first try.
Text rendering
Headlines, UI labels, packaging, non-Latin scripts. First model where text in images is genuinely reliable.
Web search during generation
In thinking mode it pulls live references mid-generation. Fact-grounded infographics come out accurate, not hallucinated.
Multi-image referencing
Up to 16 reference images for edits, character sheets, brand consistency, and style transfer.
Up to 10 consistent images
8–10 consistent outputs from a single prompt. Aspect ratios from 3:1 wide to 1:3 tall.
Built into GPT-4o
Not a bolt-on image engine. It reads and reasons over your full prompt before generating a single pixel.

GPT Image 2 reads natural language. It does not respond to keyword spam.

Words like stunning, hyper-realistic, 8k, masterpiece do nothing — they dilute the prompt. Describe facts the model can draw, not adjectives about quality.

Template
[Subject] — the what/who [Action/state] — what it's doing [Scene] — the environment [Composition] — camera angle, framing, lens [Lighting] — source, direction, quality, tone [Style] — photography/illustration/design style [Constraints] — what must NOT appear
Example
A matte black ceramic coffee mug with subtle ridge texture, sitting on a wet slate countertop next to a folded linen napkin, in a minimalist Scandinavian kitchen at sunrise, three-quarter angle, 50mm lens, shallow depth of field, soft directional window light, cool morning tone, gentle rim highlight, editorial product photography, natural film grain. No watermark, no extra objects, no logo.
Text rendering

The model's strongest feature — but it still needs explicit instruction:

  • Write the exact text you want in quotes or ALL CAPS
  • Specify font style, weight, colour, and placement explicitly
  • Add verbatim — no extra characters, no substitutions when accuracy is critical
  • End with no duplicate text, no extra words
Example
EXACT TEXT: "SPRING 2026" in bold uppercase serif, centred at the bottom third, white on charcoal background. No other text. No watermark. No duplicate text.
Editing existing images

Always tell it what changes AND what stays locked. If you don't provide a preserve list, the model drifts on faces, logos, and text you wanted to keep.

Two-column logic
Change: [exactly what should change] Preserve: [face, identity, pose, lighting, framing, background, text, layout] Constraints: [no extra objects, no redesign, no logo drift, no watermark]
Multi-image referencing

Label inputs by role and reference those labels in the instruction.

Example
Image 1: base scene to preserve. Image 2: jacket style reference. Apply the jacket from Image 2 to the person in Image 1. Match lighting and scale. Preserve everything else.
Iterative refinement

Don't solve everything in one prompt. First generation is a baseline. Then refine conversationally with short follow-up messages: warm up the sky, move the logo to the bottom right, make her expression more relaxed.

Restate your invariants every turn. Without it the model quietly redesigns things you wanted to keep.

Invariant restatement
Same character, same outfit, same lighting — only change the background to a snowy street.
Mistake Fix
Stacking adjectives — "stunning", "epic", "8k" Replace with visual facts — lens, light source, surface texture
Vague text instruction — "add a title" Specify exactly: EXACT TEXT: SUMMER COLLECTION, bold sans-serif, white, centred
No constraints at the end Always close with what must NOT appear
Assuming 4K is always better OpenAI flags above 2K as experimental — generate at high quality, upscale separately
Forgetting preserve list on edits List every element that must stay locked
Low
Fast, cheap, still solid. Good for iteration and high-volume generation.
Medium
Balanced. Good for most social and marketing assets.
High
Maximum fidelity. Use for hero images, posters, anything customer-facing.
ChatGPT
Free tier gets ~6 images/day. Plus/Pro gets unlimited + thinking mode + multi-image.
Free tier available
Higgsfield
Third-party platform with GPT Image 2 access.
Third party
fal.ai
API access for developers and high-volume generation.
API
OpenAI API
Direct API access for building image generation into your own apps.
Developer
Transparent PNG
Not yet supported in GPT Image 2 — stay on GPT Image 1.5 for that use case.
Use GPT Image 1.5
Unbox Digital
Interested in what else we can automate?
Get in touch and we'll show you what's possible for your business.
Get in touch →
Unbox Digital — Free Tools

Get free access to all our tools

Join our newsletter to unlock this page. No spam, just new tools, automations, and shortcuts when we drop them.

We won't spam you. Unsubscribe anytime.