Guide Labs Debuts Steerling-8B: The First Truly Interpretable LLM
Guide Labs Debuts Steerling-8B: The First Truly Interpretable LLM
A San Francisco startup is solving one of AI’s biggest problems: understanding why large language models do what they do.
Guide Labs, founded by CEO Julius Adebayo and chief science officer Aya Abdelsalam Ismail, has open-sourced Steerling-8B—an 8-billion-parameter LLM with a novel architecture that makes every token traceable back to its origins in the training data.
This isn’t just another model release. It’s a fundamentally different approach to building LLMs—one that could make AI safer, more controllable, and more trustworthy for regulated industries.
The Interpretability Problem
Current large language models are essentially black boxes. Whether it’s:
The fundamental challenge remains: plumbing through a neural network with billions of parameters isn’t easy.
“If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things,” Adebayo told TechCrunch. “You can do it with current models, but it’s very fragile. It’s sort of one of the holy grail questions.”
How Steerling Works
Guide Labs flipped the script on interpretability research. Instead of doing “neuroscience on a model” after training, they engineered interpretability from the ground up.
The key innovation: A concept layer that buckets data into traceable categories during training.
| Traditional Approach | Guide Labs Approach |
|———————|———————|
| Post-hoc interpretability analysis | Built-in interpretability |
| Fragile, unreliable explanations | Every token is traceable |
| Requires reverse engineering | Concept layer provides direct mapping |
| “Neuroscience” on trained models | Engineered from the start |
This requires more up-front data annotation, but Guide Labs used other AI models to assist with the annotation process, making it scalable.
Emergent Behavior Still Happens
One concern with interpretable architectures: do they eliminate the emergent behaviors that make LLMs so powerful?
Adebayo says no. His team tracks “discovered concepts” that the model learned on its own—like quantum computing—that weren’t explicitly annotated during training.
Why This Matters
1. Consumer Safety
Interpretable models can:
2. Regulated Industries
Finance, healthcare, and legal sectors need controllable LLMs:
3. Scientific Research
Protein folding has been a deep learning success, but scientists need insight into why the software identified promising combinations.
Performance Claims
Guide Labs says Steerling-8B achieves 90% of the capability of existing frontier models while using less training data—thanks to its novel architecture.
“This model demonstrates that training interpretable models is no longer a sort of science; it’s now an engineering problem,” Adebayo said. “We figured out the science and we can scale them, and there is no reason why this kind of model wouldn’t match the performance of the frontier level models.”
Company Background
What’s Next
Guide Labs plans to:
1. Build a larger model beyond 8B parameters
2. Offer API access to users
3. Enable agentic access for enterprise applications
The Bigger Picture
As AI systems become more powerful, interpretability transitions from a nice-to-have to a necessity:
“The way we’re currently training models is super primitive, and so democratizing inherent interpretability is actually going to be a long-term good thing for our role within the human race,” Adebayo said. “As we’re going after these models that are going to be super intelligent, you don’t want something to be making decisions on your behalf that’s sort of mysterious to you.”
Key Takeaways
The Bottom Line
Guide Labs is betting that the future of AI isn’t just about bigger models—it’s about understandable models. As AI systems take on more consequential decisions, the ability to trace, explain, and control their behavior becomes essential.
Steerling-8B proves that interpretability doesn’t have to come at the cost of capability. The question now: will frontier labs follow suit?
—
Sources: [TechCrunch](https://techcrunch.com/2026/02/23/guide-labs-debuts-a-new-kind-of-interpretable-llm/), [Guide Labs GitHub](https://github.com/guidelabs/steerling)
Tags: Interpretable AI, LLM, Open Source, AI Safety, Machine Learning