Failure Scenarios

These diagrams are generated, not drawn. Each one is produced by running Crabka's own deterministic KRaft consensus simulator — the same pure, no-IO state machine the broker runs in production — and recording every message, timeout, partition, and leader election it takes. Because the simulator is deterministic, the diagrams below reflect the real algorithm, step for step, rather than an idealized cartoon of it.

Split-brain prevented (leader partition)

A 3-voter cluster elects a leader, the leader is network-partitioned away from the majority, and the two-node majority elects a new leader at a higher epoch. The isolated old leader cannot make progress without a quorum, so there is never a second live leader. When the partition heals, the stale leader learns the newer epoch and steps down.

Invariant: At most one leader per epoch (election safety)

❌ Without quorum — what split-brain looks like:

sequenceDiagram
    participant N1
    participant N2
    participant N3
    Note over N1: 👑 leader epoch 1
    Note over N1,N3: ✂ network splits {N1} | {N2,N3}
    Note over N1: still thinks it is leader
    N1->>N1: accepts write A (epoch 1)
    Note over N2: ⏰ election timeout
    Note over N2: 👑 leader epoch 1 (no quorum check!)
    N2->>N3: accepts write B (epoch 1)
    Note over N1,N3: 💥 two leaders, logs diverge (A vs B)

Crabka prevents this: a candidate must win a majority of votes before it can lead, and KIP-996 pre-vote stops a partitioned node from disrupting a healthy leader. With three voters, the minority side (one node) can never reach the two-vote majority, so it cannot elect itself.

✓ With Crabka's quorum + pre-vote — the generated trace from the simulator:

sequenceDiagram
    participant N1
    participant N2
    participant N3
    Note over N1: ⏰ election timeout
    N1->>N2: PreVoteRequest
    N1->>N3: PreVoteRequest
    N2->>N1: VoteResponse(granted)
    N3->>N1: VoteResponse(granted)
    N1->>N2: VoteRequest
    N1->>N3: VoteRequest
    N2->>N1: VoteResponse(granted)
    Note over N1: 👑 leader epoch 1
    N3->>N1: VoteResponse(granted)
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    N2->>N1: Fetch
    N3->>N1: Fetch
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    Note over N1: ✂ partitioned
    Note over N2: ⏰ fetch timeout
    N2->>N3: PreVoteRequest
    N3->>N2: VoteResponse(denied)
    Note over N3: ⏰ fetch timeout
    N3->>N2: PreVoteRequest
    N2->>N3: VoteResponse(granted)
    N3->>N2: VoteRequest
    N2->>N3: VoteResponse(granted)
    Note over N3: 👑 leader epoch 2
    N3->>N2: BeginQuorumEpoch
    N2->>N3: Fetch
    N3->>N2: BeginQuorumEpoch
    Note over N1: 🔗 healed
    N3->>N1: BeginQuorumEpoch
    N3->>N2: BeginQuorumEpoch
    N1->>N3: Fetch
    N3->>N1: BeginQuorumEpoch
    N3->>N2: BeginQuorumEpoch
    N2->>N3: Fetch

Outcome: The majority side elected N3 at a strictly higher epoch. The isolated old leader N1 could not advance (no quorum), and on healing it learned the newer epoch from a BeginQuorumEpoch heartbeat and stepped down to follower. Exactly one leader remains.

Reordered message delivery

The simulator deliberately delivers a round of replication messages back-to-front (non-FIFO). Because every fetch and append carries a monotonic offset and the producing leader epoch, a node detects and ignores any message that is stale or out of order — the replicated logs still converge to the same contents.

Invariant: Log matching under reordered delivery

sequenceDiagram
    participant N1
    participant N2
    participant N3
    Note over N1: ⏰ election timeout
    N1->>N2: PreVoteRequest
    N1->>N3: PreVoteRequest
    N2->>N1: VoteResponse(granted)
    N3->>N1: VoteResponse(granted)
    N1->>N2: VoteRequest
    N1->>N3: VoteRequest
    N2->>N1: VoteResponse(granted)
    Note over N1: 👑 leader epoch 1
    N3->>N1: VoteResponse(granted)
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    N2->>N1: Fetch
    N3->>N1: Fetch
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    Note over N1: ✏ append 3 record(s)
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    N2->>N1: Fetch
    N3->>N1: Fetch
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    N1->>N2: BeginQuorumEpoch
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch

Outcome: Even though 2 in-flight messages were delivered out of order, every voter's log converged identically (4 records) and the cluster kept exactly 1 leader. Stale or late messages were ignored because each fetch and append is tagged with a monotonic offset and leader epoch.

Duplicate message delivery

The simulator delivers the same in-flight message twice. KRaft handles duplicates idempotently: a vote that was already granted/counted has no additional effect, and a BeginQuorumEpoch for an epoch the node already knows is a no-op. No double application happens and no spurious second leader emerges.

Invariant: Idempotent handling of duplicate messages

sequenceDiagram
    participant N1
    participant N2
    participant N3
    Note over N1: ⏰ election timeout
    N1->>N2: PreVoteRequest
    N1->>N3: PreVoteRequest
    N2->>N1: VoteResponse(granted)
    N3->>N1: VoteResponse(granted)
    N1->>N2: VoteRequest
    N1->>N3: VoteRequest
    N2->>N1: VoteResponse(granted)
    Note over N1: 👑 leader epoch 1
    N3->>N1: VoteResponse(granted)
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    N2->>N1: Fetch
    N3->>N1: Fetch
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    N1->>N2: BeginQuorumEpoch
    N1->>N2: BeginQuorumEpoch
    N1->>N3: BeginQuorumEpoch
    N2->>N1: Fetch

Outcome: A message was delivered twice (duplicate injected). The duplicate was handled idempotently — a vote already counted is not counted again and an already-known epoch is a no-op — so the cluster still converged to exactly 1 leader.