Guide Labs Debuts Steerling-8B: The First Truly Interpretable LLM

4 min read

Guide Labs Debuts Steerling-8B: The First Truly Interpretable LLM

A San Francisco startup is solving one of AI’s biggest problems: understanding why large language models do what they do.

Guide Labs, founded by CEO Julius Adebayo and chief science officer Aya Abdelsalam Ismail, has open-sourced Steerling-8B—an 8-billion-parameter LLM with a novel architecture that makes every token traceable back to its origins in the training data.

This isn’t just another model release. It’s a fundamentally different approach to building LLMs—one that could make AI safer, more controllable, and more trustworthy for regulated industries.

The Interpretability Problem

Current large language models are essentially black boxes. Whether it’s:

  • xAI struggling to fine-tune Grok’s political biases
  • ChatGPT’s issues with sycophancy
  • Run-of-the-mill hallucinations
  • The fundamental challenge remains: plumbing through a neural network with billions of parameters isn’t easy.

    “If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things,” Adebayo told TechCrunch. “You can do it with current models, but it’s very fragile. It’s sort of one of the holy grail questions.”

    How Steerling Works

    Guide Labs flipped the script on interpretability research. Instead of doing “neuroscience on a model” after training, they engineered interpretability from the ground up.

    The key innovation: A concept layer that buckets data into traceable categories during training.

    | Traditional Approach | Guide Labs Approach |
    |———————|———————|
    | Post-hoc interpretability analysis | Built-in interpretability |
    | Fragile, unreliable explanations | Every token is traceable |
    | Requires reverse engineering | Concept layer provides direct mapping |
    | “Neuroscience” on trained models | Engineered from the start |

    This requires more up-front data annotation, but Guide Labs used other AI models to assist with the annotation process, making it scalable.

    Emergent Behavior Still Happens

    One concern with interpretable architectures: do they eliminate the emergent behaviors that make LLMs so powerful?

    Adebayo says no. His team tracks “discovered concepts” that the model learned on its own—like quantum computing—that weren’t explicitly annotated during training.

    Why This Matters

    1. Consumer Safety

    Interpretable models can:

  • Block use of copyrighted materials
  • Better control outputs around violence or drug abuse
  • Provide explanations for controversial responses
  • 2. Regulated Industries

    Finance, healthcare, and legal sectors need controllable LLMs:

  • Loan decisions: Evaluate financial records, not race
  • Medical advice: Trace recommendations to sources
  • Legal analysis: Show reasoning chains
  • 3. Scientific Research

    Protein folding has been a deep learning success, but scientists need insight into why the software identified promising combinations.

    Performance Claims

    Guide Labs says Steerling-8B achieves 90% of the capability of existing frontier models while using less training data—thanks to its novel architecture.

    “This model demonstrates that training interpretable models is no longer a sort of science; it’s now an engineering problem,” Adebayo said. “We figured out the science and we can scale them, and there is no reason why this kind of model wouldn’t match the performance of the frontier level models.”

    Company Background

  • Founded: By Julius Adebayo (CEO) and Aya Abdelsalam Ismail (CSO)
  • Origin: Emerged from Y Combinator
  • Funding: $9 million seed round from Initialized Capital (November 2024)
  • Research roots: Adebayo’s 2018 MIT PhD paper showing existing interpretability methods were unreliable
  • What’s Next

    Guide Labs plans to:
    1. Build a larger model beyond 8B parameters
    2. Offer API access to users
    3. Enable agentic access for enterprise applications

    The Bigger Picture

    As AI systems become more powerful, interpretability transitions from a nice-to-have to a necessity:

    “The way we’re currently training models is super primitive, and so democratizing inherent interpretability is actually going to be a long-term good thing for our role within the human race,” Adebayo said. “As we’re going after these models that are going to be super intelligent, you don’t want something to be making decisions on your behalf that’s sort of mysterious to you.”

    Key Takeaways

  • Steerling-8B: 8B parameter LLM with built-in interpretability
  • Novel architecture: Concept layer makes every token traceable
  • Performance: 90% of frontier models with less training data
  • Use cases: Regulated industries, scientific research, consumer safety
  • Funding: $9M seed from Initialized Capital (Nov 2024)
  • Vision: Interpretability as an engineering problem, not research
  • The Bottom Line

    Guide Labs is betting that the future of AI isn’t just about bigger models—it’s about understandable models. As AI systems take on more consequential decisions, the ability to trace, explain, and control their behavior becomes essential.

    Steerling-8B proves that interpretability doesn’t have to come at the cost of capability. The question now: will frontier labs follow suit?

    Sources: [TechCrunch](https://techcrunch.com/2026/02/23/guide-labs-debuts-a-new-kind-of-interpretable-llm/), [Guide Labs GitHub](https://github.com/guidelabs/steerling)

    Tags: Interpretable AI, LLM, Open Source, AI Safety, Machine Learning

    Share this article

    Related Articles