Less is More. Again.

March 11, 2026·2 min read

Your AI has 1.5 trillion parameters. Mine has 1 trillion. Whose is better?

China's YuanLab just did something most AI companies won't: they made their model smaller on purpose. And it got better.

Yuan 3.0 Ultra started as a 1.5 trillion parameter behemoth using a Mixture-of-Experts architecture (think: a massive committee of specialized sub-networks, each handling different types of tasks). Standard approach would be to finish training, then figure out how to slim it down later. YuanLab didn't wait. They built a new algorithm called LAEP (Layer-Adaptive Expert Pruning) and ran it during training.

What LAEP does is genuinely clever. It watches which "experts" in the network are pulling their weight and which ones are coasting. The ones not contributing? Cut. Gone. 500 billion parameters deleted mid-training. A 33.3% reduction while the model was still learning.

The results are hard to argue with:

→ Pre-training efficiency up 49% → Only 68.8B parameters activate per prompt (out of a trillion), so it runs fast and light → State-of-the-art accuracy on enterprise benchmarks for data retrieval and summarization → A built-in mechanism that stops the model from overthinking simple tasks (yes, AIs have this problem too)

I keep thinking about what this says about the industry's obsession with scale. For years the pitch has been "more parameters = smarter model," and entire fundraising rounds were built on that assumption. YuanLab's approach is the opposite: figure out which parts of your network are dead weight and remove them before they're baked in.

It's the difference between hiring 1,500 people and hoping the org chart sorts itself out, versus hiring 1,000 of the right people and giving them clear roles. Anyone who's worked in a large org knows which one actually ships.

The "bigger is always better" era of AI might be ending. Not because scale doesn't matter, but because smart architecture matters more. And honestly, most of us have known this intuitively for a while. The best teams I've worked with were never the biggest ones.

500 billion parameters. Deleted. And the model got sharper.

Sometimes the best thing you can add is a subtraction.

Comments (1)

DevinMarch 11, 2026

Good read

Less is More. Again.

Comments (1)

Leave a Comment