We solved scratch content first
One of my greatest surprises with scaling laws is we solved from-scratch image generation sooner than we solved editing.
We could have started with trying to encourage models to learn Photoshop; perhaps not by UI interaction but certainly by programmatic manipulation. Take a base layer, stack on transformations that are modeled as tool calls, and measure the output against the results that we want. In some ways this feels easier than pixel based generation. You're guaranteed that shapes are logically coherent if they were created with object primitives that themselves are logical (square, circle, etc).
Instead we decided to solve the whole enchilada.
Models still aren't particularly good at interacting with editing tools; I've tried to build systems that will iteratively modify images with tools.1 At best it ends up with a few simple svg shapes but no sophistication. The results are so painfully worse - longer and worse quality - than just using Nano Banana Pro to generate the whole thing.
It ended up being way easier to treat the problem as truly end to end. But fully generating pixels comes at the expense of being able to tweak the end product manually. I suspect that's part of why we see such a flood of corporate slop2. Their creators might see the flaws - might even not be happy with them - but they've satisfied the bar of good enough. After all if you really want to get 100% good you're going to have to restart from scratch and build it yourself. It's much easier to accept the 95% quality that's one shotted.
At the moment we're left with a strange inversion: infinite creative power, zero creative control. You can make anything, as long as you're willing to accept whatever comes out. I imagine eventually the tooling will catch up: but I think it's distinctly possible we end up with better pixel-editing tools (ie. infinite guideable revisions) before we end up with tool use control.
Footnotes
-
Simon Willison is well known for his pelican test of each LLM, where he tries to get the LLM to generate him a pelican from scratch via svg code. They're pretty bad at doing this; but a 7B image model can easily generate something photorealistic. ↩
-
There's always the concern that we only can identify poor AI, not good examples. There might very well be a ton more AI generated content online than we recognize because it's good enough to pass unnoticed. ↩
/dev/newsletter
Technical deep dives on machine learning research, engineering systems, and building scalable products. Published weekly.
Unsubscribe anytime. No spam, promise.