In most of distributed datastore systems, there are a lot of techical terms to describe the behavior of the system. While these terms, like, “Leader”, “Follower”, “Replication”, “Consistency”, etc., are widely used and helpful, what I feel missing are the details about internal relationship between these terms.
Analogically, while the map of the field is great, it is also important to understand how the soil, water, and sunlight interact to help the plants grow.
In this blog post, I would like to explore the relationship between “Leader-Follower” and “Primary-Replica” in distributed datastore systems.
Mental model of distributed datastore systems:
Lets first build a mental model of distributed datastore systems. In general, distributed datastore systems are designed to store and manage data across multiple nodes or servers. At a very high level, these systems can be grouped into three models:
- Leader-Follower model
- Multi-Leader model
- Leaderless model
As a quick overview:
- In Leader-Follower model, one node is designated as the “Leader” (or “Primary”) and is responsible for handling all write operations. The other nodes, called “Followers” (or “Replicas”), replicate the data from the Leader and handle read operations. Important thing to note here is, all the writes always go to the Leader.
- In Multi-Leader model, multiple nodes can act as Leaders and handle write operations. Each Leader replicates its data to other nodes, which can also act as Followers. This model allows for higher availability and fault tolerance, but it can also lead to conflicts if multiple Leaders try to write to the same data simultaneously.
- In Leaderless model, there is no designated Leader node. Instead, all nodes are equal and can handle both read and write operations. Data is typically replicated across multiple nodes to ensure availability and fault tolerance. This model can be more complex to manage, as it requires mechanisms to handle conflicts and ensure consistency across all nodes.
There are definitely more nuances to each of these models, but for the purpose of this blog post, we will focus on them conceptually.
Leader-Follower vs Primary-Replica:
You will see above that “leader” and “follower” are often used interchangeably with “primary” and “replica”. However, there are some subtle differences between these terms that are important to understand.
Is it safe to say that “Leader” is always “Primary” and “Follower” is always “Replica”? Not necessarily.
- “Leader” refers to the role of a node in the context of write operations. The Leader is responsible for handling all write requests and coordinating the replication of data to Followers.
- “Primary” refers to the role of a node in the context of data storage. The Primary is the node that holds the authoritative copy of the data.
When is it safe to say Leader = Primary and Follower = Replica?
To put it more simply, look at it from the lense of “unit of data” handled in the system. In Leader-Follower model (like MongoDB), a single unit of data goes to a single node. If you look at the Leader-Follower cluster like Mongodb, it would look something like below, where there are shards, and each shard has a single Leader (Primary) and multiple Followers (Replicas). The important thing to note here is, “unit of data”, maps to a single node.

So, Leader-Follower systems, generally have a one-to-one relationship between Leader and Primary, and Follower and Replica. The main factor here is that the “unit of data” is mapped to a single node.
When is Leader != Primary and Follower != Replica?
However, in some distributed datastore systems, the relationship between Leader-Follower and Primary-Replica can be more complex. Lets take model of leaderless systems. The idea of leaderless systems is, all the writes don’t have to go to a single node. Now the leaderless systems can still be classified as:
Truly leaderless: where writes can go to any node, as in cassandra.Semi leaderless: where writes will go to primary ES shard/ Kafka -partition. Those primaries are spread across nodes (not a single host). Unlike Truely leaderless, writes cannot go to any node. They have to go where Primary of “unit-of-data” exist (shard/partition)
In both these cases, the relationship between Leader-Follower and Primary-Replica can be many-to-many. For example, in Cassandra, a single write operation can be handled by multiple nodes, each of which can act as both a Leader and a Primary for different units of data. Similarly, in Kafka, a single partition can have multiple Leaders and Replicas spread across different nodes.
As shown in the image below for Kafka, a single node can be Leader for one partition (Primary for unit-of-data) and Follower for another partition (Replica for unit-of-data). So, in this case, the “unit of data” is not mapped to a single node.

Conclusion:
The way to understand to digest this in by merging two concepts:
- Understand the “unit of data” in the system, and how it maps to nodes
- Understand the model of the distributed datastore system (Leader-Follower, Multi-Leader, Leaderless)
- In Leader-Follower systesms (like Mongodb), the “unit of data” maps to a single node, so Leader = Primary and Follower = Replica.
- In Leaderless systems (like Cassandra, Kafka), the “unit of data” can be spread across multiple nodes, so Leader != Primary and Follower != Replica.
By understanding these relationships, you can better design and manage distributed datastore systems to meet your specific needs.









