The Seed That Cannot Be Guessed

Can a language model understand a 3D shape if it can write the shortest program that rebuilds it? The idea runs into a wall named the random seed. The way around the wall is the same trick diffusion models already use.

Contents

1. The bet: short code means understanding
2. The wall: a seed you cannot guess
3. The escape hatch
4. Why the escape hatch is a trap
5. The pivot: diffusion already did this
6. The fix: learn the field, not the sample
7. The takeaway

1. The bet: short code means understanding

The real goal is hard: turn a 3D mesh into code, so a language model can reason about shape the way it reasons about text. Meshes are messy, so a cleaner stand-in works better.

Turn a Minecraft build into a program that rebuilds it.

Give an agent a small library, Box, Spline, Repeat, and let it write new functions. The build can be a 64×64×64 chunk, too big for any context window, so the agent uses tools to look: crop a region, shrink it, zoom where it is unsure. The task: rebuild the exact voxels with the shortest code.

This task rules out one failure. The dumbest valid answer lists every block as a constant:

# zero understanding: longest code, perfect output
world = [
  (0, 0, 0, "stone"),
  (0, 0, 1, "stone"),
  (0, 0, 2, "oak_planks"),
  # ... 262,141 more lines ...
]
print_blocks(world)

It rebuilds the world block for block, yet knows nothing about it: a photo of the answer, not a theory of it. It is also the longest code possible. Pushing for short code pushes the agent away from copying and toward ideas: a wall becomes a loop, a tower a spin, a forest noise. This is the old MDL idea in a Minecraft skin, the best code is the shortest one that still rebuilds the data.

\[ P^{*} \;=\; \arg\min_{P}\; \underbrace{\lVert V - \mathrm{Exec}(P)\rVert}_{\text{rebuild error}} \;+\; \lambda \cdot \underbrace{\lvert P \rvert}_{\text{code length}} \]

Short code as a way to force understanding. The same correct output can sit at very different code lengths, and only the short end looks like it understood anything.

Fig 1. Three programs, all rebuild the same build. The constant list is exact but huge. A guessed seed is tiny but wrong. We want the middle one.

2. The wall: a seed you cannot guess

People build 3D shapes with noise all the time: terrain, walls, trails, scatter, all painted by a noise function and a seed. For many builds the shortest code is a single noise call:

paint_wall(noise(seed=??, scale=0.07), threshold=0.4)

One line, except for that seed. The builder used some seed, and there is no way to guess which. Two seeds make two different, equally valid walls. The code can make a wall, not this wall.

The shortest code cannot rebuild the shape exactly unless you already know the seed. And you never do.

The shape came from a generator with a hidden coin flip:

\[ V \;=\; G(s,\, z), \qquad s = \text{seed (hidden randomness)}, \quad z = \text{settings (style, scale)} \]

Many programs come close; only one matches exactly, and that one has to store the seed, which is just copying. If a shape is not fully made of patterns, no program can both rebuild it exactly and stay short. The seed is not extra noise on the problem, the seed is the problem.

This is the same barrier as GAN inversion. A GAN learns a map from a latent vector to an image, \(z \mapsto G(z)\). Given a real image, recovering the \(z\) that made it is usually impossible: the map is many-to-one, so different latents land on the same image, and many real images sit just off the learned range, reachable by no \(z\) at all. The finished image carries less information than the input that produced it, so the input cannot be read back out. A Minecraft build is identical: the seed is the latent, the voxels are the output, and one output does not pin down the latent that made it. Recovering a hidden input from a single output is not hard, it is undefined.

A real build is also three kinds of stuff at once, mixed block by block:

Fig 2. How well each kind of content compresses. Plain geometry loves short code. Noise compresses only if you guess the right generator. Hand edits do not compress at all, you just store them.

Against this mix, the single-best-program search has no stable place to land: it copies the whole grid, picks a wrong noise model, or stalls between equal programs. The shortest-program task is ill-posed.

3. The escape hatch

One way out: stop asking for a perfect program and split the answer in two parts.

Part A, the structure. The part that earns the word understanding: boxes, symmetry, repeats, noise, splines. It explains the big patterns cheaply. A castle wall is a loop, not a list.

Part B, the leftover. A short list of block fixes: decorations, hand edits, one odd torch. Things not worth a rule, just stored.

So the world is a confident guess plus a small fix:

\[ V \;=\; G(P) \;+\; \Delta, \qquad \Delta = \text{a few stored block fixes} \]

This takes the seed pressure off. If noise fits, pick any plausible seed and move on; if not, drop the seed and let the leftover hold the difference. The seed goes from a must-have to a nice-to-have, and the build becomes a loop:

flowchart LR
  A["look at a chunk"] --> B["guess a simple rule"]
  B --> C["write code for it"]
  C --> D["run, compare to truth"]
  D --> E["error map"]
  E -->|"big error"| B
  E -->|"small error"| F["store as a fix"]

drag to pan · scroll to zoom

Fig 3. Guess, run, compare, repair. The same loop a person uses to reverse engineer anything.

4. Why the escape hatch is a trap

Saying "I don't know" is not an answer. The model will call every region unknown, dump it all into the leftover, and learn nothing. That does not scale.

An open "I don't know" channel is free space, and free things get abused. If the leftover costs nothing, copying always beats thinking, so the structure part dies. Back to the constant list, now with a polite label.

This is the same trap as NeRF and inverse rendering. A NeRF fits one scene by storing a continuous function that reproduces its pixels from any view. It compresses the scene, but it learns no rules: no wall, no repeat, no object, just a smooth lookup table over space. It is the constant block list with interpolation, photographic memory, not understanding. An unpriced leftover lets the program collapse back into exactly that.

The fix is not more structure. Unknown must never be free; it has to compete with structure under one shared cost.

\[ \mathrm{Cost} \;=\; \underbrace{\lvert P \rvert}_{\text{rule cost}} \;+\; \underbrace{\lvert \Delta \rvert}_{\text{stored fix cost}} \]

Now every block must be paid for, by a rule or by an expensive stored fix. The three options stop being a menu and start being a market:

Option	Cost if a pattern exists	Cost if it does not	Result
Explain with a rule	cheap	expensive	used where real structure is
Store the blocks	wasteful	cheap for small odd bits	used for the leftover only
"I don't know", free	always cheap	always cheap	broken, must be banned

Once "unknown" has a price, the model finds a rule first and stores a block only when a rule costs more. The leftover shrinks to a few percent instead of a dumping ground. The pressure to compress must stay on everywhere: too weak and it copies, too strong and it invents fake patterns, off and everything is "unknown".

5. The pivot: diffusion already did this

The same problem appears in image generation. Pixel diffusion handles pixel noise by sampling, latent diffusion moves the noise one level up, but neither models conceptual noise: the set of valid images is itself rough, with structure at every scale. Under a plain manifold view that set looks impossible to pin down, since noise runs through it at all levels at once.

When the data is large enough, the manifold stops being a sharp surface to rebuild and becomes a soft cloud of likelihood. Diffusion does not store where the valid points are; it learns how likely each region is. The same move fits Minecraft: stop rebuilding one sample, fit all samples at once, and let scale teach the shape of every build without a perfect single answer.

flowchart TB
  subgraph OLD["The inversion view"]
    direction LR
    s1["one build V"] --> s2["invert it"] --> s3["recover seed and program"]
    s3 --> s4["ill-posed: many seeds, one V"]
  end
  subgraph NEW["What diffusion does"]
    direction LR
    d1["the whole dataset"] --> d2["learn a score field"]
    d2 --> d3["which way is more likely"]
    d3 --> d4["sample, never invert"]
  end
  OLD -. "the missing shift" .-> NEW

drag to pan · scroll to zoom

Fig 4. The whole move: stop inverting one sample, start learning a field over the whole set.

Diffusion does not handle conceptual noise by modeling every noise level. It learns a score field over the data, not a way to rebuild samples or seeds.

Diffusion training is not "learn every noise level", not "recover the image", not "store the seed". It learns one thing: at any noisy point, which way moves toward more likely data. Noise is the training device that forces this field to be defined everywhere, not just at the data points.

\[ \nabla_{x} \log p(x) \;=\; \text{the score: the way toward more likely data} \]

6. The fix: learn the field, not the sample

Swap pixels for programs.

	Diffusion	This system
object	image	program
noise over	pixels	program edits
what it learns	score field over images	score field over programs
generation	sample, never invert	sample, never invert a seed

Fitting the single best program for one build is inverse rendering of one sample, ill-posed by design. The fix is to learn a model that scores all valid builds high and bad ones low, instead of rebuilding samples. Rebuilding pushes toward copying; scoring pushes toward shape.

The seed, the wall from section 2, just dissolves:

A seed is only one way to draw from the spread.
The model never tries to recover it.
The model only learns which kinds of builds are valid.
So many programs make one build, many seeds make one build, and the model does not care. They are all fine members of the same group.

The right target was never one program but a group of equal programs, the same way one image does not pin down one diffusion path. Copying fails not because it is banned, but because it does not generalize: a block table that nails one build explains none of the others, so under a score trained on many builds it does badly. The constant list was never a real rival once the goal turned distributional.

The design then writes itself as diffusion over programs, not seed inversion: a multi-scale voxel encoder, a model that proposes a spread of programs, an executor that runs them and returns an error map, and a step that spends the next look on the least sure region. A seed guesser stays only as an optional hint, never a need.

Scene(
    primitives=[
        Box(...),
        NoiseVolume(seed_range=..., scale=...),   # a family, not a fixed seed
        Symmetry(axis="x"),
    ],
    residual_budget=0.05,                         # the priced last 5 percent
)

Note seed_range, not seed. We commit to a family of generators and a latent we are allowed to be unsure about, not a magic number we were never going to find.

7. The takeaway

The starting assumption was that understanding means finding the one true program behind a shape, and that short code forces it to appear. Both halves are wrong.

You do not learn all the samples. You learn the shape of the space of valid answers that makes all the samples short at once.

There is no true program for a built world, only a best compression. It mixes fixed rules, noise rules with guessed settings, and a few honest fixes, none ranked above the others. The seed is not a flaw in the framing; it is the framing pointing at its own mistake. Asking what the space of such builds looks like, instead of what made this one, makes the unanswerable question disappear.

Diffusion has lived in that answer all along: it never finds a seed, it learns where the world tends to be and walks toward it. Understanding was never rebuilding; it was always a field.

On mesh-to-code, MDL, and diffusion over programs. The Minecraft framing is deliberate: discrete blocks, a readable library, and a controllable generator make it a clean lab for learning programs with hidden randomness.