gpt guide – Unbox AI | AI automation for New Zealand businesses.

GPT Image 2 — Prompting Guide

95%+

Text rendering accuracy

Native resolution

Reference images per prompt

OpenAI — Free tier available

Try GPT Image 2 in ChatGPT

chatgpt.com — free tier gets ~6 images/day

Open ChatGPT →

What makes it different

Native thinking mode

Plans composition, object placement, and constraints before rendering. Complex prompts land correctly first try.

Text rendering

Headlines, UI labels, packaging, non-Latin scripts. First model where text in images is genuinely reliable.

Web search during generation

In thinking mode it pulls live references mid-generation. Fact-grounded infographics come out accurate, not hallucinated.

Multi-image referencing

Up to 16 reference images for edits, character sheets, brand consistency, and style transfer.

Up to 10 consistent images

8–10 consistent outputs from a single prompt. Aspect ratios from 3:1 wide to 1:3 tall.

Built into GPT-4o

Not a bolt-on image engine. It reads and reasons over your full prompt before generating a single pixel.

The core principle

GPT Image 2 reads natural language. It does not respond to keyword spam.

Words like stunning, hyper-realistic, 8k, masterpiece do nothing — they dilute the prompt. Describe facts the model can draw, not adjectives about quality.

Prompt structure — use this order

Template

[Subject] — the what/who
[Action/state] — what it's doing
[Scene] — the environment
[Composition] — camera angle, framing, lens
[Lighting] — source, direction, quality, tone
[Style] — photography/illustration/design style
[Constraints] — what must NOT appear

Example

A matte black ceramic coffee mug with subtle ridge texture, sitting on a wet slate countertop next to a folded linen napkin, in a minimalist Scandinavian kitchen at sunrise, three-quarter angle, 50mm lens, shallow depth of field, soft directional window light, cool morning tone, gentle rim highlight, editorial product photography, natural film grain. No watermark, no extra objects, no logo.

Prompting techniques

Text rendering

▼

The model's strongest feature — but it still needs explicit instruction:

Write the exact text you want in quotes or ALL CAPS
Specify font style, weight, colour, and placement explicitly
Add verbatim — no extra characters, no substitutions when accuracy is critical
End with no duplicate text, no extra words

Example

EXACT TEXT: "SPRING 2026" in bold uppercase serif, centred at the bottom third, white on charcoal background. No other text. No watermark. No duplicate text.

Editing existing images

▼

Always tell it what changes AND what stays locked. If you don't provide a preserve list, the model drifts on faces, logos, and text you wanted to keep.

Two-column logic

Change: [exactly what should change] Preserve: [face, identity, pose, lighting, framing, background, text, layout] Constraints: [no extra objects, no redesign, no logo drift, no watermark]

Multi-image referencing

▼

Label inputs by role and reference those labels in the instruction.

Example

Image 1: base scene to preserve. Image 2: jacket style reference. Apply the jacket from Image 2 to the person in Image 1. Match lighting and scale. Preserve everything else.

Iterative refinement

▼

Don't solve everything in one prompt. First generation is a baseline. Then refine conversationally with short follow-up messages: warm up the sky, move the logo to the bottom right, make her expression more relaxed.

Restate your invariants every turn. Without it the model quietly redesigns things you wanted to keep.

Invariant restatement

Same character, same outfit, same lighting — only change the background to a snowy street.

Common mistakes

Mistake	Fix
Stacking adjectives — "stunning", "epic", "8k"	Replace with visual facts — lens, light source, surface texture
Vague text instruction — "add a title"	Specify exactly: EXACT TEXT: SUMMER COLLECTION, bold sans-serif, white, centred
No constraints at the end	Always close with what must NOT appear
Assuming 4K is always better	OpenAI flags above 2K as experimental — generate at high quality, upscale separately
Forgetting preserve list on edits	List every element that must stay locked

Quality settings

Low

Fast, cheap, still solid. Good for iteration and high-volume generation.

Medium

Balanced. Good for most social and marketing assets.

High

Maximum fidelity. Use for hero images, posters, anything customer-facing.

Where to access it

ChatGPT

Free tier gets ~6 images/day. Plus/Pro gets unlimited + thinking mode + multi-image.

Free tier available

Higgsfield

Third-party platform with GPT Image 2 access.

Third party

fal.ai

API access for developers and high-volume generation.

API

OpenAI API

Direct API access for building image generation into your own apps.

Developer

Transparent PNG

Not yet supported in GPT Image 2 — stay on GPT Image 1.5 for that use case.

Use GPT Image 1.5

Unbox Digital

Interested in what else we can automate?

Get in touch and we'll show you what's possible for your business.

Get in touch →

Unbox Digital — Free Tools

Get free access to all our tools

Join our newsletter to unlock this page. No spam, just new tools, automations, and shortcuts when we drop them.

We won't spam you. Unsubscribe anytime.

GPT Image 2.The promptingguide.

Get free access to all our tools

GPT Image 2.
The prompting
guide.