Crabka vs Strimzi (Kubernetes)
Operator-managed, six-broker comparison on GKE. Crabka and Strimzi (Apache Kafka) run under identical Kubernetes pod resources, driven by the same Rust load driver over the Kafka wire protocol. Each scenario is run ten times and averaged. Crabka matches or beats Strimzi's throughput while resident in a fraction of the memory and serving fetches zero-copy — sendfile(2) on plaintext and kernel-TLS (kTLS) on encrypted connections.
This is a Kubernetes comparison: two six-broker clusters (RF=3), one managed by
the Crabka operator and one by Strimzi, brought up on the
same GKE node pool with byte-for-byte identical pod resources. Each scenario is
driven by the same Rust load driver (crabka-bench-driver) over the Kafka wire
protocol, producing and consuming through a Kubernetes Job. Broker CPU and
container working-set memory come from cAdvisor; the JVM heap / non-heap split
comes from the Strimzi JMX exporter. Every cell is the mean of ten runs, so
the numbers below carry run-to-run error bars rather than a single sample. The
harness lives in bench/.
For the same data over the run window — throughput, broker CPU and broker memory as time series, every individual run plus the average — see Throughput, CPU & memory over time.
Environment
| Platform | GKE, e2-standard-4 nodes (4 vCPU, 16 GiB), one broker per node |
| Storage | pd-ssd PersistentClaim, 200 GiB per broker |
| Pod resources | identical both stacks: 2–4 vCPU, 6 GiB request / 12 GiB limit |
| Crabka | crabka-operator + crabka-broker, release build, 6 brokers, RF=3 |
| Strimzi | Strimzi 1.0, Apache Kafka 4.2 (KRaft), JVM, 6 brokers, RF=3 |
| Driver | crabka-bench-driver, in-cluster Job, Kafka wire protocol |
| Repeats | 10 runs per (scenario, stack); cells are the mean |
The two stacks get the same pods, the same storage class, the same partition counts, and the same driver; brokers run one cluster at a time.
Produce-and-consume
Every record produced is also consumed back through the same driver. Higher is better except for memory. Throughput is the crabka ÷ strimzi ratio; memory and msgs/CPU-core are the Crabka advantage.
| scenario (6 brokers, RF=3) | Crabka | Strimzi | throughput | Crabka mem | Strimzi mem | msgs/CPU-core |
|---|---|---|---|---|---|---|
| small-msg-saturate (100 B, acks=leader) | 99.4k msgs/s | 70.0k msgs/s | 1.42× | 190 MiB | 9 217 MiB | 4.1× |
| fan-out (1 KiB, 4P/4C) | 68.4k msgs/s | 47.9k msgs/s | 1.43× | 620 MiB | 7 483 MiB | 2.2× |
| fixed-rate-latency (1 KiB, paced) | 9.1k msgs/s | 6.8k msgs/s | 1.34× | 305 MiB | 5 159 MiB | 3.3× |
| large-msg (100 KiB, acks=leader) | 5.8k msgs/s | 4.2k msgs/s | 1.37× | 131 MiB | 7 443 MiB | 4.6× |
| mixed-acks (1 KiB, acks=all) | 20.0k msgs/s | 19.6k msgs/s | 1.02× | 306 MiB | 6 277 MiB | 2.6× |
| high-partition-saturate (100 part., 100 B) | 274.2k msgs/s | 235.8k msgs/s | 1.16× | 1 812 MiB | 8 106 MiB | 1.6× |
| high-partition-latency (100 part., paced) | 5.8k msgs/s | 5.3k msgs/s | 1.08× | 2 306 MiB | 7 261 MiB | 1.5× |
| high-partition-fanout (100 part., fan-out) | 52.8k msgs/s | 39.0k msgs/s | 1.36× | 5 584 MiB | 12 358 MiB | 1.5× |
On throughput, Crabka wins every steady-state and fan-out workload outright
(1.34–1.43×), ties the acks=all workload, and leads both the saturating
and paced 100-partition runs. Where it pulls clearly ahead is everything around
the throughput:
- Memory: a Crabka broker's container working set runs from the low hundreds of MiB up to a few GiB on the heaviest 100-partition workloads; a Strimzi broker carries 5–12 GiB, the bulk of it JVM heap — a 2.2–57× gap depending on workload, structural rather than a tuning default.
- CPU efficiency: 1.5–4.6× more messages per CPU-core in every scenario — equal-or-better throughput delivered for noticeably less CPU.
Failover
A ninth scenario kills partition 0's leader mid-run. Strimzi fails over transparently — no measured message loss. Crabka recovers in tens of seconds: producers re-discover the new leader after a latency spike and a small number of dropped records (a fraction of a percent of the messages in flight), then resume producing. Recovery is repeatable — every run in the matrix re-converges within that window. Its whole-run failover throughput trails Strimzi's (0.89×) — the one scenario where Strimzi leads outright — though even here a Crabka broker holds a fraction of the memory (2.3× leaner) and turns 1.6× the messages per CPU-core.
Zero-copy fetch
Crabka serves the records portion of a Fetch response without copying it
through userspace: on Linux it sendfile(2)s the log-segment bytes straight
from the page cache to the socket (page-cache → NIC), exactly as Apache Kafka
does. The produce path likewise appends the producer's record batch verbatim,
without a decode/re-encode round trip. macOS and the BSDs use their native
sendfile; other platforms fall back to a buffered copy.
On encrypted connections Crabka uses Linux kTLS (kernel TLS): after the
rustls handshake, record encryption is offloaded to the kernel, so sendfile
runs through TLS — the kernel encrypts the page-cache pages into TLS records on
the way to the NIC. Encryption stays zero-copy, so a Crabka broker's TLS fetch
throughput tracks its plaintext throughput. Every wire byte is identical across
the plaintext and TLS paths; kTLS only moves where encryption happens, not what
crosses the wire.
Methodology notes
- Both stacks get identical pod resource requests/limits, the same
pd-ssdstorage class, the same partition counts, and the same in-cluster driver. - Container memory is the cgroup working set (
container_memory_working_set_bytes) summed over the broker pods; CPU is the cAdvisor CPU-seconds over the measured window. The Strimzi JVM heap / non-heap split is read from its JMX exporter. - Each (scenario, stack) cell is run ten times; the table shows the mean, and the time-series page shows every run plus the average with run-to-run spread. Shared-cloud infrastructure has meaningful variance, so the inter-stack ratio is the most reliable read.
Reproduce
The GKE cluster is Terraform, checked into the repo at
bench/terraform/gke/;
its README
is the full end-to-end recipe — provision the cluster, install both operators +
Prometheus, run the matrix, aggregate.
# provision the e2-standard-4 / pd-ssd cluster and point kubectl at it:
cd bench/terraform/gke && terraform init && terraform apply
eval "$(terraform output -raw get_credentials_command)"
# install both operators + Prometheus:
just -f bench/justfile install-all
# run the full 6-broker matrix, 10 repeats, on both stacks:
RUNS=10 bash bench/run-matrix.sh 6broker-rf3
# aggregate into SUMMARY.md, CSVs, a standalone report.html, and the
# website charts fragment (per-run + averaged throughput/CPU/memory):
just -f bench/justfile bench-report