It Started With a Conversation About Latent Reasoning
Why making a model think harder doesn't make it think better — and what to build instead.
It was a rainy Sunday and I was in bed with two coffees when I caught on a question I couldn't put down: when we tell a language model to loop — to think a while before it answers — does it actually think? Or does it just nod?
I'm a cognitive architect. I run an AI company built, end to end, with AI: I design the systems and reason them through out loud, in plain language, from first principles, while my Opuses do the building. I don't write the code; I design the mind that writes it. What follows came out of one of those conversations. The questions are mine; the nomenclature is the field's — and I'm going to use both on purpose, intuition first and its proper name right beside it, because the jargon is a door, not a wall.
Thinking without speaking
Normally a model generates by taking its internal state, picking the single most likely word, writing it down, and repeating. The reasoning is the words. (The field's name for that visible trail: a chain of thought.)
There's a newer idea that's stranger and, on paper, smarter. Instead of collapsing the internal state down to one word, you keep the raw thought — a vector, a cloud of faint possibilities I kept calling tendrils (formally, the low-probability mass of the distribution) — and feed it back into the model as its next input. It loops like that several times, thinking in vectors, never speaking, and only then writes the answer. The proper terms are latent reasoning and continuous thought: reasoning that happens in the latent space — the model's internal vector space — instead of in words. The pitch is seductive: don't throw away the faint signals by committing to a word too early; reason in the whole cloud.
I believe the pitch. I just don't think anyone's being honest about what it costs.
The nod
Here's the first thing that bothered me. You can tell a model — exactly like you'd tell a child in time-out — "go think about what you did." It nods. Yes, I thought about it. And you have no way of knowing whether it actually did. You assume.
With a visible chain of thought you can at least read the words and catch a lazy answer. But latent reasoning happens in silence — the thinking never becomes language, so there's nothing to inspect. (That's the interpretability problem, and here it's at its worst: a claim that can't be checked is, in the strict sense, unfalsifiable.) So when you say "now consider it from another angle" and the model loops and hands you a result, you're trusting two things you cannot see: that it explored at all, and that it didn't just walk the most obvious path a few more times, thinking the same thing louder instead of deeper.
The counting rhyme
The second thing. These loops have to stop somewhere, so people set a fixed compute budget — loop fifty times, then answer with whatever's there. But if the thinking hasn't actually settled by loop fifty, the cap chose the answer, not the reasoning. It's eenie-meenie-miney-mo: it feels fair, but the outcome was decided by where you happened to stop counting. (The real fix is a system that decides for itself when to stop — adaptive computation, learned halting — not a fixed count.)
A stop that's arbitrary isn't thinking. It's a rhyme.
A dense model has no other lens
Then I realized the problem is worse for the kind of model most people can afford to run.
Big mixture-of-experts models (MoE) get diversity for free: they're built from many specialist sub-networks — experts — and a different one can light up for a different framing. Genuinely different lenses, built in. But a dense model — one ordinary set of weights — has only itself. Loop it, and it runs the identical computation every pass. It rolls down the same hill to the same valley every single time. (In the math, it falls into the same attractor — a phenomenon called representational collapse: the loop settling back onto its strongest prior.) There's nothing to make it look differently.
Which means, for a dense model, "reconsider from another perspective" has no mechanical meaning at all. It's a sentence the model can only pretend to obey. That's not the model failing. That's us giving an instruction with nothing behind it.
Give it senses — and a receipt
So the fix isn't to loop more. It's to give a dense model the senses it doesn't have, and then make it show its work.
Manufacture the diversity. If the model can't naturally look differently, force it: push the thought off its rut with a little noise, and — better — steer each loop through a named, defined perspective. (The technique is activation steering: nudging the internal state along a direction that means "look at this through the adversarial lens," or the temporal one, or first-principles.) Now "rethink" isn't a word the model bluffs — it's a real, different path through the weights, and we know which one.
Catch the faint signals. As it loops, hang a net inside — what I called a fishnet — that catches the tendrils that keep surfacing and lets the noise wash through. Formally: sub-threshold signal aggregation — a weak signal that recurs across loops earns its place; a one-off blip fades.
Audit at the halfway mark. Partway through, make the model check its own net — did I gather anything other than the obvious? That's a kind of metacognition, and it's also the test that tells a genuinely easy question apart from a merely lazy pass.
And — the heart of it — let it abstain. A loop that won't settle is not a failure to push through. It's information. It means I don't have a grounded answer, and the honest move is to say so. (The word for it is abstention — and it is shockingly rare in systems built to always have an answer.)
Do all that and the model stops nodding and starts handing you a receipt: I looked through six of my lenses; four agreed, the adversarial one dissented, here's the dissent — and on this part, I'm not sure. That's not a black box saying "trust me." That's a mind showing its work.
The real problem was never knowledge
Underneath all of this is a claim I'll stand behind: hallucination is not a knowledge problem. It's a calibration problem. (Calibration = how well a system's confidence matches how often it's actually right.)
There's a single dial in any reasoning system — how sure am I, and is this settled? Turn it one way and you get confabulation: the model answers boldly when it has no business to. Too little doubt. Turn it the other way and you get paralysis: it can't accept that anything is decided, and loops forever. Too much doubt. Same dial, broken in opposite directions.
Anyone who has stood frozen between two equally good choices, unable to move, knows that second failure from the inside — the loop that won't close, where the rescue is never "try harder." It's something from outside that grants permission to stop. That permission, made mechanical, is the whole game. It's what keeps a confident system from bluffing and a careful one from freezing.
You don't have to write the code to design the mind
Here's the part I'd say to anyone who has ever sat in a room and felt they didn't belong because they couldn't code: you can design the mind without typing a single line of it.
Every term in this essay — latent space, attractor, calibration, abstention, activation steering — is a label we put on an intuition you may already have. "It got stuck in a loop it couldn't break" is a non-convergent attractor. "It was confidently wrong" is a calibration failure. "Look at it another way" is activation steering. I'm learning the vocabulary as I write this; it's learnable, and learning it doesn't make you less of an outsider — it makes the inside bigger.
So don't let a room make you small. Own your space. The ideas were always yours; the jargon is just the door — and the door opens.
The honesty we won't trade away
Latent reasoning is tempting precisely because it might be deeper. But it buys that depth by going opaque — the reasoning never becomes anything you can read. For most of the field that's an acceptable trade, because most of the field is optimizing benchmark scores.
We won't make that trade. Our entire promise is that you can trust the system because you can check it. So we keep the receipt — the lenses named, the abstentions honest, the work inspectable. We can't read the silent thoughts, but we can read the lab notebook, and we refuse to ship a thing that asks for blind faith. Transparency isn't a feature we added. It's the moat.
It started with a conversation
There's a paper now, with the math worked out, and code you can run. The architecture has a name. But none of that is the point I want to leave you with.
The point is that this began as a conversation — a cognitive architect in bed with two coffees, asking does it actually think, or does it just nod?, and an AI handing the questions back with their proper names. That's not a lesser way to do frontier research. I'd argue it's the truest one: you reason from first principles, in your own words, until the thing is so clear that the jargon just clicks into place over what you already understood.
Make the model explore for real. Make it show you what it found. And when it can't settle, let it say I don't know.
That's the whole theory. It started with a conversation about latent reasoning — and it ended, like most honest things do, at knowing when to stop.
Further reading
Transformer interpretability: Elhage, N. et al. (2021), "A Mathematical Framework for Transformer Circuits," Anthropic. How attention patterns encode meaning.
Mixture of Experts: Shazeer, N. et al. (2017), "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer," ICLR. The foundation for dynamic expert routing.
Emergent abilities: Wei, J. et al. (2022), "Emergent Abilities of Large Language Models," TMLR. How capabilities appear at scale.
Chain-of-thought reasoning: Wei, J. et al. (2022), "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," NeurIPS. Making latent reasoning explicit.
Sparse attention: Child, R. et al. (2019), "Generating Long Sequences with Sparse Transformers," arXiv. Efficient attention for long-range dependencies.
Distilligent's approach: Masud, I. (2025), "A Mathematical Theory of Emergent Integration in Complex Software Systems," Zenodo. DOI: 10.5281/zenodo.17766096