ChatGPT Images 2.0 vs Gemini Nano Banana 2: Hands-On Test and Will It Replace Photoshop?

Key Takeaways

OpenAI launched ChatGPT Images 2.0 on April 21, 2026, with a new ‘thinking’ step that reasons before drawing — API model gpt-image-2, priced at $0.211/image at 1024×1024.
In hands-on testing, gpt-image-2 widens OpenAI’s structural lead — editorial layouts, magazine covers, and technical infographics are near-perfect where Gemini still stumbles on geometry.
Gemini Nano Banana 2 still wins photo editing — it preserves resolution (1500+ px wide vs OpenAI’s 1024 cap), holds subject detail better, and runs 5-10× faster.
Speed trade-off is real: gpt-image-2 runs 97-149 seconds per image with thinking mode on; Gemini returns in 11-24 seconds. OpenAI’s reasoning step is not free.
Will it replace Photoshop? No — but it replaces the quick jobs most creators used Photoshop for (background swaps, style transfers, adding objects) and it reshapes publishing/design workflow around prompt-first briefs.

On April 21, 2026, OpenAI introduced ChatGPT Images 2.0 — a new image-generation model that, for the first time, uses a reasoning step before it starts drawing. Sam Altman called it “like going from GPT-3 to GPT-5 all at once.” Bold words for an image model.

The pitch: cleaner text, tighter layouts, 2K-resolution output, aspect ratios from 3:1 ultra-wide to 1:3 ultra-tall, and the ability to generate up to 8 consistent images in one prompt when thinking mode is on. API pricing runs $30 per million output tokens — roughly $0.211 per high-quality 1024×1024 image, about 60% more expensive than the prior gpt-image-1.5 model. The API name is gpt-image-2.

The obvious question: does it beat the current benchmark — Google’s Gemini 3 Flash Image (internally nicknamed “Nano Banana 2”), which has held the quality crown in the image-gen world for most of 2026? We ran them side-by-side.

How We Tested

Six identical tasks on both models — gpt-image-2 (OpenAI, hands-on API access verified April 22, 2026) and gemini-3.1-flash-image-preview (Google) — using four real source photos curated via SampleShots:

Background swap — replace an outdoor portrait’s background with a clean white studio backdrop
Style transfer — turn a minimalist architectural shot into a hand-drawn pencil sketch
Object addition — add a chocolate croissant to a cafe coffee shot, matching light and depth of field
Artistic transform — convert a vintage Rolleiflex product shot into a watercolor painting
Generate: magazine cover — editorial text-rendering test with a specific title, tagline, and issue line
Generate: infographic — the exposure triangle with labelled icons and values (composition test)

Every output in this post is a real, unretouched API response. The OpenAI calls used quality=high, size=1024×1024 for edits and 1024×1536 / 1536×1024 for the generation tasks. Gemini used its default preview-image mode. Nothing was cherry-picked — each pair is the first response the API returned.

Test 1: Background Swap (Portrait)

The classic Photoshop job: strip a subject from a busy outdoor scene and drop them onto clean studio white. Both models were given the same prompt: “Replace the background behind the subject with a clean white studio seamless backdrop. Keep the subject, clothing, pose, and lighting exactly as they are.”

Source photo of a young woman with long auburn hair standing in a sunlit park setting — Source: Photo by Christopher Campbell via Unsplash, 50mm f/2.5, 1/160s, ISO 200. Curated on SampleShots.

The results

OpenAI gpt-image-2 background swap result — OpenAI gpt-image-2 — 130.7s · clean swap, hair detail and skin tones preserved well at the 1024 px output

Gemini 3.1 Flash Image background swap result — Gemini Nano Banana 2 — 11.2s · full 1500 px resolution preserved, wispy hair detail intact

Gemini wins on resolution, OpenAI wins on latency trade-off value. Both stripped the background cleanly, but Gemini returned a file 3× the size of OpenAI’s 1024 px output. For production work where the edit feeds into a larger layout or print, the resolution gap still matters. Gpt-image-2’s result is a clear step up from gpt-image-1.5’s thumbnail-sized output — but the 1024 px cap remains a real limit, and the 130-second render time is a production planning consideration.

Test 2: Style Transfer (Pencil Sketch)

Can the model convert a photograph into a convincing hand-drawn style while keeping the subject recognizable? Same prompt for both: “Transform this photograph into a fine-pencil sketch, black and white, hand-drawn editorial illustration style with visible pencil strokes.”

Minimalist photograph of a white architectural edge against a clear blue sky — Source: Photo by Bernard Hermant via Unsplash, architectural minimalism shot at f/8, 1/900s, ISO 200. Curated on SampleShots.

The results

Abstract OpenAI output that lost the subject entirely — OpenAI gpt-image-2 — 137.8s · abandoned the subject, produced an abstract pencil-shaded shape. A real failure.

Sharp editorial pencil sketch from Gemini — Gemini Nano Banana 2 — 14.8s · editorial-grade pencil illustration with paper texture and clean architectural lines

Gemini wins decisively. This is the test where gpt-image-2’s reasoning step actively hurts — the model appears to have ‘thought’ itself away from the source image, producing an abstract composition that discarded the architecture entirely. Gemini delivered a genuinely printable illustration with hatching detail and clean edges. Style transfer on real photographs is clearly still Gemini territory.

Test 3: Object Addition (Croissant)

Adding realistic objects into an existing scene is one of the hardest tests — the new element has to match lighting, shadow, and depth of field. Prompt for both: “Add a single chocolate croissant on a small white plate placed next to the coffee cup. Match the existing warm morning light and depth of field. Subtle, photorealistic.”

Cafe lifestyle shot of a single coffee mug on a wooden table in warm morning light — Source: Photo by Annie Spratt via Unsplash, 50mm f/1.8, 1/50s, ISO 2500. Curated on SampleShots.

The results

Coffee and croissant on plate, OpenAI gpt-image-2 output — OpenAI gpt-image-2 — 149.3s · croissant added believably, lighting and shadow match close to photorealistic

Coffee and croissant, Gemini output with flaky pastry detail — Gemini Nano Banana 2 — 24.0s · convincing flaky pastry, richer bokeh on the cup, kept original composition

Tighter race — effectively a tie. Gpt-image-2 has clearly closed the gap here: the croissant is photorealistic, the plate matches the table tone, and the lighting integrates naturally. Gemini still has the edge on pastry detail (you can count flakes) and retained the original deeper bokeh on the cup. For everyday social-content editing, either is production-ready. This is the kind of edit that would have taken a Photoshop user 15 minutes of masking and compositing; both models nailed it in under three minutes.

Test 4: Artistic Transform (Watercolor)

Watercolor is the hardest of the painterly styles — it lives or dies on pigment bleed, paper texture, and knowing what to leave white. Prompt: “Transform this photograph into a traditional watercolor painting with soft color washes and visible paper texture.”

Vintage Rolleiflex twin-lens film camera, product-style shot against a plain background — Source: vintage Rolleiflex film camera by Markus Spiske via Unsplash. Curated on SampleShots.

The results

Watercolor treatment of the Rolleiflex by gpt-image-2 — OpenAI gpt-image-2 — 137.1s · kept

Detailed Gemini watercolor with crisp ROLLEIFLEX and COMPUR text — Gemini Nano Banana 2 — 13.5s · soft washes, both

Gpt-image-2 closes a gap that was wide with gpt-image-1.5. Both kept the engraved nameplate text legible — something 1.5 couldn’t do. Gemini’s version has warmer, more varied pigment; gpt-image-2’s is cooler and more restrained. Aesthetic preference will split the room. For editorial use where predictability matters, OpenAI’s tighter interpretation is arguably more useful.

Test 5: Magazine Cover (Text Rendering)

This is where OpenAI’s launch claims get tested. Both models had to generate a magazine cover titled PHOTOWORKOUT with the tagline The AI Photography Revolution, a subline April 2026 · Issue 47, and a vintage 35mm camera image — all text spelled correctly, typeset cleanly.

The results

OpenAI gpt-image-2 magazine cover with Pentax K1000 camera and correct title and issue line — OpenAI gpt-image-2 — 108.1s · near-perfect layout: title, tagline, issue line all correct; used a Pentax K1000 body (a specific real camera, not generic)

Gemini magazine cover with Leica M3 body and over-delivered coverlines — Gemini Nano Banana 2 — 15.7s · polished, uses a Leica M3, but over-delivered with invented coverlines that weren't in the prompt

OpenAI wins on prompt fidelity. Gpt-image-2 nailed exactly what was asked — the three text blocks requested, nothing added, nothing omitted. Gemini rendered beautifully but over-delivered with four invented coverlines (“AI & THE FUTURE OF FILM”, “ESSENTIAL TECHNIQUES”, etc.) that weren’t briefed. For client work where spec adherence matters, OpenAI’s stricter interpretation is the right behavior. One caveat: gpt-image-2 chose to render a specific Pentax K1000 — a real product — where Gemini generalized to a Leica silhouette. Neither is a problem in practice but it’s worth noting that Images 2.0 will produce recognizable branded products when it feels confident it can.

Test 6: Technical Infographic (Layout Fidelity)

Our hardest prompt: a labelled exposure-triangle infographic with three icons arranged in a triangle and one example value under each icon. This tests whether the model understands both the subject matter AND the spatial layout instruction.

The results

OpenAI exposure triangle infographic in proper triangle layout with connecting lines — OpenAI gpt-image-2 — 96.6s · perfect triangle layout, all three icons connected with lines, correct example values

Gemini exposure triangle as stacked icons, not triangular layout — Gemini Nano Banana 2 — 13.8s · icons and text rendered correctly, but arranged vertically — ignored the triangular geometry

OpenAI wins cleanly — this is the test where gpt-image-2 earns its price. Gemini rendered the content correctly (aperture, shutter, ISO icons with correct values) but ignored the triangular layout instruction and stacked them vertically. Gpt-image-2 nailed the geometry — icons in a triangle, connecting lines drawn between them, labels in the correct positions. This is the single clearest signal that Images 2.0’s reasoning step delivers on structural composition where Gemini still falls back to generic layouts.

Additional Capabilities: OpenAI’s Own Samples

OpenAI’s launch samples showcase capabilities beyond what our six tests exercised. A few highlights, sourced directly from the announcement:

OpenAI sample poster titled Greater Precision and Control, modernist editorial layout — OpenAI's

Three capabilities from OpenAI’s announcement that our tests didn’t cover:

Multi-image consistency: up to 8 images from a single prompt, with characters, objects, and styles staying coherent across frames (useful for storyboards, product-shot sets, social carousels)
Text rendering at scale: the wolf-magazine sample below packs 15+ distinct text blocks on one spread, all readable
Non-Latin scripts: Japanese, Korean, Chinese, Hindi and Bengali typography are all markedly better (the feature TechRadar and Engadget called out most)

OpenAI Images 2.0 sample showing a wolf-themed magazine spread with multiple text blocks, maps, and callouts — OpenAI sample: a full editorial magazine spread about wolves — 15+ text blocks rendered legibly. Sourced from the launch post.

OpenAI sample of a handwritten notebook page on Toronto baseball history — OpenAI sample: a handwritten notebook page with consistent pencil texture and legible prose.

Scoreboard: What Each Model Is Actually Good At

Task	Gemini Nano Banana 2	OpenAI gpt-image-2	Winner
Background swap	Full-res, detail preserved	Clean swap, 1024 cap	Gemini (resolution)
Pencil sketch style	Editorial-grade	Failed — abstract shape	Gemini
Add object (croissant)	Richer pastry detail	Clean integration	Tie
Watercolor transform	Warm pigment, text kept	Cooler tones, text kept	Tie
Magazine cover (text)	Polished, over-delivered	Perfect prompt adherence	OpenAI
Exposure triangle layout	Ignored geometry	Perfect triangle layout	OpenAI
Speed (avg edit)	16s	139s	Gemini (8.7×)
Output resolution cap	~1500 px side	1024 px (API standard)	Gemini
Cost per 1024×1024 high-quality	~$0.04	$0.211	Gemini (5×)

The picture is now clear: Gemini for editing photos, OpenAI for generating layouts. Images 2.0 widened the structural-layout lead meaningfully — the exposure-triangle test is the starkest example — while Gemini held onto photo-editing fidelity and speed. Neither is strictly better; the right choice depends on which bottleneck you’re trying to solve.

Will It Replace Photoshop?

No. But it replaces most of what people used Photoshop for.

Photoshop has always been two products shipped together: a pixel-precise retouching tool for professionals (frequency separation, precise dodge/burn, print-grade color management, masking a single strand of hair), and a general-purpose image manipulator for everyone else. ChatGPT Images 2.0 and Gemini Nano Banana 2 have definitively taken over that second product. The tasks in our tests — background swap, add object, style transfer, recompose — used to require 15-30 minutes of manual work. They now take 15-150 seconds and a one-sentence prompt.

What these models cannot replace, yet:

Pixel-perfect retouch — high-end beauty, product, and commercial retouching where every pore, stitch, or reflection matters
Print-grade color management — ICC profiles, soft-proofing, CMYK separations for magazines and packaging
Non-destructive layered workflows — adjustment layers, smart objects, masks that another designer can open and modify
Real-time precision — brush strokes, spot healing, clone stamping a specific 50-pixel region without affecting the rest
RAW processing — still Lightroom/Capture One territory, with AI denoising layered in

The broader truth: Adobe isn’t sitting still. Photoshop’s own Generative Fill and Generative Expand features already use Firefly behind the scenes, and the company’s pitch is increasingly “AI-first inside a layered editor,” not “no AI.” The real shift isn’t Photoshop dying; it’s Photoshop becoming the last-mile tool after an AI model has done the bulk of the work.

Impact on Photography, Publishing, and Design Workflow

For photographers

The bread-and-butter post-production edits — background cleanup, small-object removal, adding sky, swapping weather — now take seconds to minutes. That’s not a threat to the craft; it’s a return to the actual craft. Photographers who built their brand on Photoshop-heavy manipulation will need to shift. Photographers whose value is being on location at the right moment with the right eye have more time to spend on that, and less on masking.

The trickier question is stock photography. AI generation is already reshaping what editorial and marketing teams buy — something we covered in detail earlier this month.

For publishing and editorial teams

The magazine-cover test above is not a party trick — it’s a real change in how a news desk operates. A story about the Z6 III price drop used to need a stock photo license plus 30 minutes of layout work to produce a hero image. Now a news-desk editor can prompt a publishable hero in under two minutes. Time-to-publish goes down. Volume goes up. The quality bar on editorial graphics rises correspondingly — because the floor also rises.

The latency is the interesting new constraint. Gpt-image-2 at 100+ seconds per image is a production-planning consideration: it’s not a tool for interactive “let me try 20 variants” workflow. It’s a tool where you brief it once, think for two minutes, and come back to a near-final asset. Gemini Nano Banana 2’s 15-second turnaround enables the iterative workflow that Figma-style designers favor.

For designers

Two shifts to watch: the brief-to-draft cycle collapses, and the discarded-concepts pile explodes. You used to fight for three concept directions in a pitch deck. Now you can generate thirty in a couple of hours. The designer’s value moves from craft-execution to concept curation, prompt engineering, and knowing which of 30 directions actually solves the client’s problem.

The losers are mid-tier production shops whose value proposition was “we can execute faster than an in-house team.” The winners are solo designers with strong taste, and agencies that lean into the concept-volume advantage. Images 2.0’s structural-layout strength specifically benefits designers who need magazine spreads, infographics, and text-heavy compositions.

One More Example: How Small Text Rendering Has Changed

Text rendering was the single biggest complaint about AI image models in 2024 and 2025 — letters would warp, spacing would drift, and whole words would dissolve into gibberish mid-sentence. OpenAI’s “rice mound” sample shows where the bar is now: recognizable, photorealistic, with clean texture detail that would have taken a studio photographer a controlled light setup.

OpenAI Images 2.0 close-up sample of uncooked rice on burlap, photorealistic texture — OpenAI's own Images 2.0 sample: a photorealistic close-up of uncooked rice on burlap. Sourced from the launch post.

Frequently Asked Questions

What is ChatGPT Images 2.0?

OpenAI’s latest image-generation model, launched April 21, 2026. It introduces a ‘thinking’ step that reasons through a prompt before producing an image, aimed at better layout fidelity, text rendering, and multi-image consistency. The API model name is gpt-image-2.

How much does ChatGPT Images 2.0 cost?

On the API, pricing is $8 per million input image tokens and $30 per million output tokens. That works out to roughly $0.211 per high-quality 1024×1024 image — about 60% more expensive than the prior gpt-image-1.5 model at $0.133. In the ChatGPT app, it’s included in Plus, Pro, and Business tiers; Free and Go users get the new model with some limitations on thinking-mode outputs.

How long does gpt-image-2 take per image?

In our testing on April 22, 2026, gpt-image-2 ran between 97 and 149 seconds per image at quality=high, 1024×1024. By comparison, Gemini 3.1 Flash Image returned in 11-24 seconds. The thinking step is where the latency comes from — and it’s worth it for structural tasks where quality matters more than speed.

What is Gemini Nano Banana 2?

‘Nano Banana 2’ is the unofficial nickname for Google’s gemini-3-flash-image (and gemini-3.1-flash-image-preview) — Google’s current image-generation model within the Gemini 3 family. It’s been the quality leader in AI image generation for most of 2026, which is the benchmark OpenAI is explicitly targeting with Images 2.0.

Is Images 2.0 better than Gemini Nano Banana 2?

Based on our six-test hands-on comparison: Gemini wins for editing existing photographs (resolution preserved, 5-10× faster, better detail fidelity). OpenAI wins for structured generation (magazine layouts, infographics, text-heavy graphics, tight prompt adherence). The split is cleaner than it was between gpt-image-1.5 and Gemini — Images 2.0 narrowed the photo-editing gap meaningfully, while widening the structural-layout lead.

Will ChatGPT Images 2.0 replace Photoshop?

No — but it replaces most of what non-professional users did in Photoshop. For pixel-perfect retouching, print-grade color management, and non-destructive layered workflows, Photoshop (increasingly with its own AI features) remains the tool. For quick edits, stylistic transforms, and one-off graphics, AI image models have already taken over.

Can I use ChatGPT Images 2.0 for commercial work?

Yes. OpenAI grants commercial usage rights to images generated by users with paid tiers. Double-check attribution requirements if you’re publishing. Be aware that outputs may occasionally reflect copyrighted style or branded products — in our magazine-cover test, gpt-image-2 spontaneously rendered a recognizable Pentax K1000 body even though the prompt asked only for ‘a vintage 35mm camera.’ Treat outputs like a stock image source: confirm the content before publishing.

What about Adobe Firefly and Generative Fill?

Adobe’s Firefly runs Generative Fill and Generative Expand inside Photoshop. The pitch is ‘AI inside a layered editor’ rather than ‘AI instead of an editor.’ For photographers already in the Creative Cloud ecosystem, Firefly offers tighter integration with selections, masks, and layers — trading raw model quality for workflow continuity.

Why does my gpt-image-2 API call return 403?

As of the April 21, 2026 launch, OpenAI gates gpt-image-2 and chatgpt-image-latest behind organization verification. Go to platform.openai.com → Settings → Organization → Verify. Propagation takes up to 15 minutes. The older gpt-image-1.5 and gpt-image-1 remain available without verification.

Source photographs curated via SampleShots and Unsplash: Christopher Campbell, Bernard Hermant, Annie Spratt, and Markus Spiske. OpenAI sample images sourced directly from the ChatGPT Images 2.0 launch post. Hands-on test outputs generated by OpenAI gpt-image-2 and Google gemini-3.1-flash-image-preview by PhotoWorkout on April 22, 2026.

Don’t miss this week’s photography news

Every Tuesday: camera launches, lens announcements, and the photography industry moves that matter — curated before the big sites catch up. Free, one-click unsubscribe.

How We Tested

Test 1: Background Swap (Portrait)

The results

Test 2: Style Transfer (Pencil Sketch)

The results

Test 3: Object Addition (Croissant)

The results

Test 4: Artistic Transform (Watercolor)

The results

Test 5: Magazine Cover (Text Rendering)

The results

Test 6: Technical Infographic (Layout Fidelity)

The results

Additional Capabilities: OpenAI’s Own Samples

Scoreboard: What Each Model Is Actually Good At

Will It Replace Photoshop?

Impact on Photography, Publishing, and Design Workflow

For photographers

For publishing and editorial teams

For designers

One More Example: How Small Text Rendering Has Changed

Frequently Asked Questions

What is ChatGPT Images 2.0?

How much does ChatGPT Images 2.0 cost?

How long does gpt-image-2 take per image?

What is Gemini Nano Banana 2?

Is Images 2.0 better than Gemini Nano Banana 2?

Will ChatGPT Images 2.0 replace Photoshop?

Can I use ChatGPT Images 2.0 for commercial work?

What about Adobe Firefly and Generative Fill?

Why does my gpt-image-2 API call return 403?

Related Reading

Don’t miss this week’s photography news

Andreas De Rosi

Before the big sites catch up.