Crabka vs Strimzi (Kubernetes)

Operator-managed, six-broker comparison on GKE. Crabka and Strimzi (Apache Kafka) run under identical Kubernetes pod resources, driven by the same Rust load driver over the Kafka wire protocol. Each scenario is run ten times and averaged. Crabka matches or beats Strimzi's throughput while resident in a fraction of the memory and serving fetches zero-copy — sendfile(2) on plaintext and kernel-TLS (kTLS) on encrypted connections.

This is a Kubernetes comparison: two six-broker clusters (RF=3), one managed by the Crabka operator and one by Strimzi, brought up on the same GKE node pool with byte-for-byte identical pod resources. Each scenario is driven by the same Rust load driver (crabka-bench-driver) over the Kafka wire protocol, producing and consuming through a Kubernetes Job. Broker CPU and container working-set memory come from cAdvisor; the JVM heap / non-heap split comes from the Strimzi JMX exporter. Every cell is the mean of ten runs, so the numbers below carry run-to-run error bars rather than a single sample. The harness lives in bench/.

For the same data over the run window — throughput, broker CPU and broker memory as time series, every individual run plus the average — see Throughput, CPU & memory over time.

Environment

PlatformGKE, e2-standard-4 nodes (4 vCPU, 16 GiB), one broker per node
Storagepd-ssd PersistentClaim, 200 GiB per broker
Pod resourcesidentical both stacks: 2–4 vCPU, 6 GiB request / 12 GiB limit
Crabkacrabka-operator + crabka-broker, release build, 6 brokers, RF=3
StrimziStrimzi 1.0, Apache Kafka 4.2 (KRaft), JVM, 6 brokers, RF=3
Drivercrabka-bench-driver, in-cluster Job, Kafka wire protocol
Repeats10 runs per (scenario, stack); cells are the mean

The two stacks get the same pods, the same storage class, the same partition counts, and the same driver; brokers run one cluster at a time.

Produce-and-consume

Every record produced is also consumed back through the same driver. Higher is better except for memory. Throughput is the crabka ÷ strimzi ratio; memory and msgs/CPU-core are the Crabka advantage.

scenario (6 brokers, RF=3)CrabkaStrimzithroughputCrabka memStrimzi memmsgs/CPU-core
small-msg-saturate (100 B, acks=leader)99.4k msgs/s70.0k msgs/s1.42×190 MiB9 217 MiB4.1×
fan-out (1 KiB, 4P/4C)68.4k msgs/s47.9k msgs/s1.43×620 MiB7 483 MiB2.2×
fixed-rate-latency (1 KiB, paced)9.1k msgs/s6.8k msgs/s1.34×305 MiB5 159 MiB3.3×
large-msg (100 KiB, acks=leader)5.8k msgs/s4.2k msgs/s1.37×131 MiB7 443 MiB4.6×
mixed-acks (1 KiB, acks=all)20.0k msgs/s19.6k msgs/s1.02×306 MiB6 277 MiB2.6×
high-partition-saturate (100 part., 100 B)274.2k msgs/s235.8k msgs/s1.16×1 812 MiB8 106 MiB1.6×
high-partition-latency (100 part., paced)5.8k msgs/s5.3k msgs/s1.08×2 306 MiB7 261 MiB1.5×
high-partition-fanout (100 part., fan-out)52.8k msgs/s39.0k msgs/s1.36×5 584 MiB12 358 MiB1.5×

On throughput, Crabka wins every steady-state and fan-out workload outright (1.34–1.43×), ties the acks=all workload, and leads both the saturating and paced 100-partition runs. Where it pulls clearly ahead is everything around the throughput:

  • Memory: a Crabka broker's container working set runs from the low hundreds of MiB up to a few GiB on the heaviest 100-partition workloads; a Strimzi broker carries 5–12 GiB, the bulk of it JVM heap — a 2.2–57× gap depending on workload, structural rather than a tuning default.
  • CPU efficiency: 1.5–4.6× more messages per CPU-core in every scenario — equal-or-better throughput delivered for noticeably less CPU.

Failover

A ninth scenario kills partition 0's leader mid-run. Strimzi fails over transparently — no measured message loss. Crabka recovers in tens of seconds: producers re-discover the new leader after a latency spike and a small number of dropped records (a fraction of a percent of the messages in flight), then resume producing. Recovery is repeatable — every run in the matrix re-converges within that window. Its whole-run failover throughput trails Strimzi's (0.89×) — the one scenario where Strimzi leads outright — though even here a Crabka broker holds a fraction of the memory (2.3× leaner) and turns 1.6× the messages per CPU-core.

Zero-copy fetch

Crabka serves the records portion of a Fetch response without copying it through userspace: on Linux it sendfile(2)s the log-segment bytes straight from the page cache to the socket (page-cache → NIC), exactly as Apache Kafka does. The produce path likewise appends the producer's record batch verbatim, without a decode/re-encode round trip. macOS and the BSDs use their native sendfile; other platforms fall back to a buffered copy.

On encrypted connections Crabka uses Linux kTLS (kernel TLS): after the rustls handshake, record encryption is offloaded to the kernel, so sendfile runs through TLS — the kernel encrypts the page-cache pages into TLS records on the way to the NIC. Encryption stays zero-copy, so a Crabka broker's TLS fetch throughput tracks its plaintext throughput. Every wire byte is identical across the plaintext and TLS paths; kTLS only moves where encryption happens, not what crosses the wire.

Methodology notes

  • Both stacks get identical pod resource requests/limits, the same pd-ssd storage class, the same partition counts, and the same in-cluster driver.
  • Container memory is the cgroup working set (container_memory_working_set_bytes) summed over the broker pods; CPU is the cAdvisor CPU-seconds over the measured window. The Strimzi JVM heap / non-heap split is read from its JMX exporter.
  • Each (scenario, stack) cell is run ten times; the table shows the mean, and the time-series page shows every run plus the average with run-to-run spread. Shared-cloud infrastructure has meaningful variance, so the inter-stack ratio is the most reliable read.

Reproduce

The GKE cluster is Terraform, checked into the repo at bench/terraform/gke/; its README is the full end-to-end recipe — provision the cluster, install both operators + Prometheus, run the matrix, aggregate.

# provision the e2-standard-4 / pd-ssd cluster and point kubectl at it:
cd bench/terraform/gke && terraform init && terraform apply
eval "$(terraform output -raw get_credentials_command)"

# install both operators + Prometheus:
just -f bench/justfile install-all

# run the full 6-broker matrix, 10 repeats, on both stacks:
RUNS=10 bash bench/run-matrix.sh 6broker-rf3

# aggregate into SUMMARY.md, CSVs, a standalone report.html, and the
# website charts fragment (per-run + averaged throughput/CPU/memory):
just -f bench/justfile bench-report