Every AI coding agent promises to write correct code. None of them guarantee it — until now.

On March 16, 2026, Mistral released Leanstral, an open-source AI agent that doesn’t just generate code. It proves its code is correct against mathematical specifications. The distinction matters, and it shot to #5 on Hacker News within hours — 586 points, 127 comments, and a lot of developers suddenly paying attention to formal verification.

The problem nobody talks about

If you’ve used any AI coding assistant in production, you know the drill. You get plausible-looking code. Sometimes it works. Sometimes it compiles but has subtle bugs. Sometimes it fails at 3 AM under load, in the worst possible edge case.

The fundamental issue: AI models generate code from statistical patterns, not correctness guarantees. A model can produce code that looks right — sensible variable names, conventional structure — but contains logic errors a compiler won’t catch and tests might miss.

This is fine for a weekend project. It’s terrifying for smart contracts, cryptographic protocols, or anything where a single bug costs real money. The usual answer is “just review it carefully,” but that pushes the burden back onto humans — exactly the bottleneck we’re trying to solve.

What Leanstral actually does

Leanstral is a sparse mixture-of-experts model with 120 billion total parameters but only 6 billion active at any time. The sparse architecture means you get the capability of a much larger model without the compute cost — important when proof generation can be expensive.

But the architecture isn’t the story. The story is Lean 4.

Lean 4 is a proof assistant and programming language. Unlike Python or JavaScript, Lean doesn’t just execute code — it checks whether code satisfies a specification. When you write a function in Lean, you can also write a theorem: “this function always returns a sorted list” or “this algorithm always terminates.” Then Lean verifies that claim mathematically.

Leanstral generates both the implementation and the proof. It doesn’t write sort(list) and hope for the best. It writes sort(list) along with a machine-checked proof that the output is actually sorted for all inputs. That proof isn’t a test case — it’s a mathematical argument that the compiler verifies.

The agent integrates with Lean’s language server through MCP (Model Context Protocol), giving it real-time feedback during proof construction. It can see where proofs fail, try alternative strategies, and iterate — more like a mathematician working through a problem than a code completion engine.

How it performs

Mistral evaluated Leanstral using FLTEval, a benchmark suite focused on real proof engineering tasks rather than toy problems.

On the pass@2 metric — the model gets two attempts per problem — Leanstral scores 26.3 points. The compute cost? $36. A comparable proprietary model scored worse while costing over 15 times more.

That cost efficiency comes from the sparse architecture. You’re not burning compute on all 120 billion parameters for every token. The model routes to relevant experts and leaves the rest idle. For proof generation, which often requires many attempts before finding a valid proof, this cost difference is the gap between “affordable for daily use” and “reserve it for special occasions.”

Why this matters beyond math proofs

Formal verification sounds academic. It’s not.

Smart contracts and blockchain protocols are the obvious use case. A single bug in a DeFi contract has cost hundreds of millions of dollars. Formal verification is the gold standard for preventing those bugs, but it’s historically been slow and expensive. Leanstral makes it accessible.

Cryptographic implementations need mathematical guarantees that no amount of testing can provide. You can’t test every possible input to a hash function, but you can prove it satisfies its specification.

Safety-critical software — aerospace, medical devices, autonomous systems — already uses formal methods where regulations require it. The bottleneck has always been human proof engineers. Leanstral doesn’t replace them, but it accelerates the work dramatically.

Even for regular software, the shift is meaningful. If your coding agent can prove its output meets a spec before handing it to you, the review process changes from “does this look right?” to “is this the right spec?” That’s a fundamentally better question to ask.

Getting started

Leanstral is available under Apache 2.0 on Hugging Face. To use it effectively, you need:

  • Lean 4 installed — the proof assistant that Leanstral targets. The Lean community has excellent getting-started guides.
  • Familiarity with Lean’s proof language — Leanstral generates proofs, but you need to understand them to use the output. If you’ve never used a proof assistant, expect a learning curve. It’s not like learning a new framework; it’s closer to learning a new way of thinking about code.
  • A realistic workflow — Leanstral works best when you specify what your code should do before asking it to write the code. That means thinking in terms of preconditions, postconditions, and invariants. If you’re used to prompting with “write me a function that sorts a list,” the shift to “write me a function that sorts a list and prove it returns a permutation of the input in non-decreasing order” takes some adjustment.

The honest limitations

Leanstral is impressive, but it’s not magic.

Proof generation still fails sometimes. Formal verification is hard, even for humans. Leanstral gets better success rates than alternatives, but there are still proofs it can’t find. For complex properties, expect to intervene and guide the proof strategy.

The domain is Lean 4. If your codebase is in Rust, Python, or TypeScript, Leanstral doesn’t directly verify your code. You write the spec and implementation in Lean, get the verified version, then translate to your target language. The proof gives you confidence, but there’s a gap between “proven correct in Lean” and “correct in your production language.”

The learning curve is real. If you and your team have never used formal methods, Leanstral won’t flatten that curve. It reduces the tedium of proof construction, but you still need to understand what you’re proving and why.

It’s March 2026. This is a v1 release. The tooling, documentation, and ecosystem will mature, but right now you’re an early adopter. Expect rough edges.

The bottom line

Leanstral represents something genuinely new: an AI coding agent that can verify its own work. Not with tests. Not with vibes. With mathematical proofs that a compiler checks.

If you’re working on anything where correctness is non-negotiable — smart contracts, cryptographic code, safety-critical systems — Leanstral is worth your attention. The $36 price point for competitive proof generation makes it practical in ways formal verification tools have never been.

For general software development, it’s more signal than tool you’ll use tomorrow. But it signals where AI-assisted coding is heading: not just faster code generation, but verified code generation. The gap between “the AI wrote this” and “the AI proved this is correct” is exactly the kind of gap that separates useful tools from reliable ones.

Start with the Lean community resources. Try Leanstral on a small, well-specified problem. See what it feels like to receive code with a proof attached. That experience — code that comes with a guarantee — is what the next generation of AI coding tools should feel like.


Related reading: How to Build a Practical AI Workflow Without Wasting Money, ChatGPT vs Claude vs Gemini: Everyday Work in 2026, AI Agents Are Everywhere, but Which Ones Are Genuinely Useful?