Guide Labs Debuts Steerling-8B: The First Truly Interpretable LLM

A San Francisco startup is solving one of AI’s biggest problems: understanding why large language models do what they do.

Guide Labs, founded by CEO Julius Adebayo and chief science officer Aya Abdelsalam Ismail, has open-sourced Steerling-8B—an 8-billion-parameter LLM with a novel architecture that makes every token traceable back to its origins in the training data.

This isn’t just another model release. It’s a fundamentally different approach to building LLMs—one that could make AI safer, more controllable, and more trustworthy for regulated industries.

The Interpretability Problem

Current large language models are essentially black boxes. Whether it’s:

xAI struggling to fine-tune Grok’s political biases

ChatGPT’s issues with sycophancy

Run-of-the-mill hallucinations

The fundamental challenge remains: plumbing through a neural network with billions of parameters isn’t easy.

“If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things,” Adebayo told TechCrunch. “You can do it with current models, but it’s very fragile. It’s sort of one of the holy grail questions.”

How Steerling Works

Guide Labs flipped the script on interpretability research. Instead of doing “neuroscience on a model” after training, they engineered interpretability from the ground up.

The key innovation: A concept layer that buckets data into traceable categories during training.

This requires more up-front data annotation, but Guide Labs used other AI models to assist with the annotation process, making it scalable.

Emergent Behavior Still Happens

One concern with interpretable architectures: do they eliminate the emergent behaviors that make LLMs so powerful?

Adebayo says no. His team tracks “discovered concepts” that the model learned on its own—like quantum computing—that weren’t explicitly annotated during training.

Why This Matters

1. Consumer Safety

Interpretable models can:

Block use of copyrighted materials

Better control outputs around violence or drug abuse

Provide explanations for controversial responses

2. Regulated Industries

Finance, healthcare, and legal sectors need controllable LLMs:

Loan decisions: Evaluate financial records, not race

Medical advice: Trace recommendations to sources

Legal analysis: Show reasoning chains

3. Scientific Research

Protein folding has been a deep learning success, but scientists need insight into why the software identified promising combinations.

Performance Claims

Guide Labs says Steerling-8B achieves 90% of the capability of existing frontier models while using less training data—thanks to its novel architecture.

“This model demonstrates that training interpretable models is no longer a sort of science; it’s now an engineering problem,” Adebayo said. “We figured out the science and we can scale them, and there is no reason why this kind of model wouldn’t match the performance of the frontier level models.”

Company Background

Founded: By Julius Adebayo (CEO) and Aya Abdelsalam Ismail (CSO)

Origin: Emerged from Y Combinator

Funding: $9 million seed round from Initialized Capital (November 2024)

Research roots: Adebayo’s 2018 MIT PhD paper showing existing interpretability methods were unreliable

What’s Next

Guide Labs plans to:
1. Build a larger model beyond 8B parameters
2. Offer API access to users
3. Enable agentic access for enterprise applications

The Bigger Picture

As AI systems become more powerful, interpretability transitions from a nice-to-have to a necessity:

“The way we’re currently training models is super primitive, and so democratizing inherent interpretability is actually going to be a long-term good thing for our role within the human race,” Adebayo said. “As we’re going after these models that are going to be super intelligent, you don’t want something to be making decisions on your behalf that’s sort of mysterious to you.”

Key Takeaways

Steerling-8B: 8B parameter LLM with built-in interpretability

Novel architecture: Concept layer makes every token traceable

Performance: 90% of frontier models with less training data

Use cases: Regulated industries, scientific research, consumer safety

Funding: $9M seed from Initialized Capital (Nov 2024)

Vision: Interpretability as an engineering problem, not research

The Bottom Line

Guide Labs is betting that the future of AI isn’t just about bigger models—it’s about understandable models. As AI systems take on more consequential decisions, the ability to trace, explain, and control their behavior becomes essential.

Steerling-8B proves that interpretability doesn’t have to come at the cost of capability. The question now: will frontier labs follow suit?

—

Sources: [TechCrunch](https://techcrunch.com/2026/02/23/guide-labs-debuts-a-new-kind-of-interpretable-llm/), [Guide Labs GitHub](https://github.com/guidelabs/steerling)

Tags: Interpretable AI, LLM, Open Source, AI Safety, Machine Learning

Guide Labs Debuts Steerling-8B: The First Truly Interpretable LLM

Guide Labs Debuts Steerling-8B: The First Truly Interpretable LLM

The Interpretability Problem

How Steerling Works

Emergent Behavior Still Happens

Why This Matters

1. Consumer Safety

2. Regulated Industries

3. Scientific Research

Performance Claims

Company Background

What’s Next

The Bigger Picture

Key Takeaways

The Bottom Line

Topics

Related Articles

Show HN: Declare AI – Open Standard for AI Content Disclosure

How AI Agents Could Destroy the Economy: A Reality Check

Pentagon Sets Friday Deadline for Anthropic to Abandon Ethics Stance on AI Weapons