Discord’s Architecture: A Masterclass in Performance at Scale

3 min read

Discord handles 19 million concurrent users, trillions of messages, and delivers sub-second latency across voice, video, and text. Here is how they built one of the most performant real-time systems in consumer tech.

The Core Insight

At first glance, Discord seems like “just another chat app.” But under the hood lies a finely-tuned system built on the Actor Model—a paradigm that lets Discord process millions of events per second without race conditions or data loss.

The key insight: every server, WebSocket connection, voice call, and screenshare is an actor. They communicate through messages, maintain their own state, and process one message at a time. This eliminates the need for locks while enabling massive scale.

Why This Matters

Discord’s journey reveals how architectural choices compound at scale. What works at 100 users becomes a disaster at 100 million. Their story illustrates both the power of the Actor Model and the unexpected bottlenecks that emerge when systems grow.

The numbers tell the story:
– 10,000 messages → 100 million notifications
– 100,000 messages → 10 billion notifications
Each message must fan out to every online member in a server.

Key Takeaways

The Actor Model in Practice
– Each Discord server (guild) is a single Elixir process
– Users connect via WebSocket, spinning up session processes that connect to guilds
– The guild process acts as a router, fanning out messages to all connected sessions

Database Evolution
– Started with Apache Cassandra, hit hot partition problems at scale
– Migrated to ScyllaDB, achieving better per-core sharding and eliminating garbage collection pauses
– Reduced query rates by 10-50x with a Rust-based data service for request deduplication

The Thundering Herd Problem
When 100 users request the same message simultaneously, naive implementations make 100 database queries. Discord’s Rust service coalesces these: one DB request, results sent to all subscribers via gRPC.

# Publishing to other guilds in 4 lines of Elixir
def handle_call({:publish, message}, _from, %{sessions: sessions}=state) do
  Enum.each(sessions, &send(&1.pid, message))
  {:reply, :ok, state}
end

Looking Ahead

Discord’s approach reveals something important: performance optimization is never “done.” Each bottleneck solved reveals the next. After fan-out and database layers, they tackled disk latency—leading to custom storage solutions that balance SSD speed with Persistent Disk reliability.

For developers building real-time systems, Discord’s story offers a blueprint: start with clean architecture (the Actor Model), measure relentlessly, and be willing to rewrite critical paths when scale demands it.


Based on analysis of “Discord: A Case Study in Performance Optimization” (FullStack Zip)

Share this article

Related Articles