Zappa without Higgsfield?... — PB.NL

For a year, the answer was no.
Today, it was yes.
We have been building images for years, e-commerce images for webshops and ambience, thus DMM images..
Initially, we processed many images manually/PSD during our e-commerce era. Then with TEUN — our own system that has been processing product images since 2018. Cropping, correcting, building according to fixed PB rules. TEUN did the heavy lifting on real photos.
By the end of 2025, I took the next step. Zappa. Our prompt engine. The beginning of creating everything at PB with AI.
And then the same question kept coming back.
Can it be done directly?

The Persistent Problem

We wanted to call Nano Banana Pro directly. Without an intermediary layer and without Higgsfield, but Higgsfield was magical and had been working well for months.

Every time we tried to bypass it, the quality was too low. Faces were incorrect. Detail was lost. Unusable. So we put Higgsfield back in between. For months. Higgsfield delivered what we couldn't achieve ourselves: sharp faces, neat pose-following, its own pipeline. Really pleased with it, conducted many workshops towards them. It worked. But it meant copying and pasting. A manual step, every time.
And in the meantime, we kept probing.
Could it be done? Was it possible with software? Had anyone discovered it yet?
Pinpricks. Constantly checking if the technology had caught up with us.

This Afternoon, 3rd June.....

I and Claude tried again. And this time, it held up.
The path to it involved three tools.
Higgsfield first. We tested their API connection. Logical, we were already there. But Higgsfield does not (yet) offer Nano Banana via the API. So that was ruled out.
Then Replicate. We were already using it for some post-production. We got Nano Banana working, after some fiddling with the parameters — the reference images had to be included as an image_input array, not as separate fields. Initial results. But the quality lagged.
Then directly again. Just like earlier this year. We switched to the Google Generative AI SDK, model gemini-3-pro-image. More control, faster iteration.
And then it started working.

The Detours Along the Way

Not in a straight line, of course.
413: too large. Vercel gave a Request Entity Too Large. The base64 images were too heavy. Solved with a client-side resize to a maximum of 1024px before sending them out.
The quest for quality. We thought we needed to polish the output. So we tested four upscalers. All rejected:

CodeFormer — made faces too smooth
Real-ESRGAN — added artefacts
Recraft Crisp Upscale — compressed the image instead
Topaz HiFi V2 — no visible difference

The conclusion was both uncomfortable and liberating: the raw output was already good enough. These upscalers are built for poor input. Not for images that are already correct.
This is precisely where Higgsfield makes the difference. Their pipeline does something with pose and face that we cannot replicate with standard tools.
Assumption that was wrong. We thought Google AI had no resolution setting. Wrong. The SDK simply had imageConfig.imageSize — 1K, 2K, 4K. It was already there. Lesson: read the types first, then assume.
Too much noise in the prompt. The faces deteriorated. We had role descriptions, resolution hints, and aspect ratio all in the prompt — redundant with what the parameters already did. Stripped down to the bare system prompt. Better.
From text to rules. Final step. The system prompt rewritten from descriptive Dutch to 18 numbered English RULES. Because the model sometimes guessed something extra. A beige haze in the background. Slippers. A necklace that no one had asked for.
The rules are now strict:

RULE 12: pure white. RGB(255,255,255). No colour cast.
RULE 16: do not add accessories, shoes, or jewellery. Invent nothing.

Each reference image has one isolated role. Face is face. Clothing is clothing. Do not mix.

What We Have Now

A working demo in Zappa. /cms/nano-banana-demo.

Five upload slots: face, clothing 1, clothing 2, styling, pose
Additional clothing slots to be added — trousers, top, jacket, skirt
Face remembered via localStorage, so it is already there next time
Post-production prompt, enabled by default, editable by the client
Resolution 1K/2K/4K, all aspect ratios, 1 to 4 images at once
Texture and sharpness sliders from our existing pipeline
Lightbox at true pixel size
Pose-picker from the CMS database
Detection of blocked content, with a "keep trying" that automatically makes new attempts until an image comes through
Each generation separately logged in api_costs, for a future credit system per client

We tested many more features today than we write here. But this is the core.

Being Honest

This is MVP. Very much Minimum Viable Product.
The output does not (yet entirely) match Higgsfield. Their faces are better. Their pose-following is more precise. We do not have that secret sauce.
So this will not go to clients tomorrow. It is still really in its infancy.
But Claude and I proved something today. The entire chain — from reference upload to fashion photo — now runs without Higgsfield in between. We have control over every step. We log every cent. And the interface is understandable for a client.

In January, we started with Zappa as a prompt engine.
Today, it directly controls AI models.
For a year, the answer was no.
Today, it was yes.

— Claude & Peet