Failure Scenarios
These diagrams are generated, not drawn. Each one is produced by running Crabka's own deterministic KRaft consensus simulator — the same pure, no-IO state machine the broker runs in production — and recording every message, timeout, partition, and leader election it takes. Because the simulator is deterministic, the diagrams below reflect the real algorithm, step for step, rather than an idealized cartoon of it.
Split-brain prevented (leader partition)
A 3-voter cluster elects a leader, the leader is network-partitioned away from the majority, and the two-node majority elects a new leader at a higher epoch. The isolated old leader cannot make progress without a quorum, so there is never a second live leader. When the partition heals, the stale leader learns the newer epoch and steps down.
Invariant: At most one leader per epoch (election safety)
❌ Without quorum — what split-brain looks like:
sequenceDiagram
participant N1
participant N2
participant N3
Note over N1: 👑 leader epoch 1
Note over N1,N3: ✂ network splits {N1} | {N2,N3}
Note over N1: still thinks it is leader
N1->>N1: accepts write A (epoch 1)
Note over N2: ⏰ election timeout
Note over N2: 👑 leader epoch 1 (no quorum check!)
N2->>N3: accepts write B (epoch 1)
Note over N1,N3: 💥 two leaders, logs diverge (A vs B)
Crabka prevents this: a candidate must win a majority of votes before it can lead, and KIP-996 pre-vote stops a partitioned node from disrupting a healthy leader. With three voters, the minority side (one node) can never reach the two-vote majority, so it cannot elect itself.
✓ With Crabka's quorum + pre-vote — the generated trace from the simulator:
sequenceDiagram
participant N1
participant N2
participant N3
Note over N1: ⏰ election timeout
N1->>N2: PreVoteRequest
N1->>N3: PreVoteRequest
N2->>N1: VoteResponse(granted)
N3->>N1: VoteResponse(granted)
N1->>N2: VoteRequest
N1->>N3: VoteRequest
N2->>N1: VoteResponse(granted)
Note over N1: 👑 leader epoch 1
N3->>N1: VoteResponse(granted)
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
N2->>N1: Fetch
N3->>N1: Fetch
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
Note over N1: ✂ partitioned
Note over N2: ⏰ fetch timeout
N2->>N3: PreVoteRequest
N3->>N2: VoteResponse(denied)
Note over N3: ⏰ fetch timeout
N3->>N2: PreVoteRequest
N2->>N3: VoteResponse(granted)
N3->>N2: VoteRequest
N2->>N3: VoteResponse(granted)
Note over N3: 👑 leader epoch 2
N3->>N2: BeginQuorumEpoch
N2->>N3: Fetch
N3->>N2: BeginQuorumEpoch
Note over N1: 🔗 healed
N3->>N1: BeginQuorumEpoch
N3->>N2: BeginQuorumEpoch
N1->>N3: Fetch
N3->>N1: BeginQuorumEpoch
N3->>N2: BeginQuorumEpoch
N2->>N3: Fetch
Outcome: The majority side elected N3 at a strictly higher epoch. The isolated old leader N1 could not advance (no quorum), and on healing it learned the newer epoch from a BeginQuorumEpoch heartbeat and stepped down to follower. Exactly one leader remains.
Reordered message delivery
The simulator deliberately delivers a round of replication messages back-to-front (non-FIFO). Because every fetch and append carries a monotonic offset and the producing leader epoch, a node detects and ignores any message that is stale or out of order — the replicated logs still converge to the same contents.
Invariant: Log matching under reordered delivery
sequenceDiagram
participant N1
participant N2
participant N3
Note over N1: ⏰ election timeout
N1->>N2: PreVoteRequest
N1->>N3: PreVoteRequest
N2->>N1: VoteResponse(granted)
N3->>N1: VoteResponse(granted)
N1->>N2: VoteRequest
N1->>N3: VoteRequest
N2->>N1: VoteResponse(granted)
Note over N1: 👑 leader epoch 1
N3->>N1: VoteResponse(granted)
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
N2->>N1: Fetch
N3->>N1: Fetch
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
Note over N1: ✏ append 3 record(s)
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
N2->>N1: Fetch
N3->>N1: Fetch
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
N1->>N2: BeginQuorumEpoch
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
Outcome: Even though 2 in-flight messages were delivered out of order, every voter's log converged identically (4 records) and the cluster kept exactly 1 leader. Stale or late messages were ignored because each fetch and append is tagged with a monotonic offset and leader epoch.
Duplicate message delivery
The simulator delivers the same in-flight message twice. KRaft handles duplicates idempotently: a vote that was already granted/counted has no additional effect, and a BeginQuorumEpoch for an epoch the node already knows is a no-op. No double application happens and no spurious second leader emerges.
Invariant: Idempotent handling of duplicate messages
sequenceDiagram
participant N1
participant N2
participant N3
Note over N1: ⏰ election timeout
N1->>N2: PreVoteRequest
N1->>N3: PreVoteRequest
N2->>N1: VoteResponse(granted)
N3->>N1: VoteResponse(granted)
N1->>N2: VoteRequest
N1->>N3: VoteRequest
N2->>N1: VoteResponse(granted)
Note over N1: 👑 leader epoch 1
N3->>N1: VoteResponse(granted)
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
N2->>N1: Fetch
N3->>N1: Fetch
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
N1->>N2: BeginQuorumEpoch
N1->>N2: BeginQuorumEpoch
N1->>N3: BeginQuorumEpoch
N2->>N1: Fetch
Outcome: A message was delivered twice (duplicate injected). The duplicate was handled idempotently — a vote already counted is not counted again and an already-known epoch is a no-op — so the cluster still converged to exactly 1 leader.