Crabka vs Apache Kafka 4.3

Single-box, like-for-like comparison: same host, same load driver, same wire protocol. Crabka drives a full produce-and-consume round-trip against Apache Kafka 4.3 — matching its throughput within a few percent (at or ahead on the 1 KiB workloads) while resident in 30–40× less memory, sustaining 1.15–1.2× more work per CPU core, and starting in 1–2 s instead of 8–9 s.

This is a Kubernetes-free, single-machine comparison run with bench/local/run-local-bench.sh. Each scenario runs once per stack against a freshly-formatted single-node broker, driven by the same Rust load driver (crabka-bench-driver) over the Kafka wire protocol on localhost:9092. Broker CPU-seconds and peak RSS are scraped from /proc and folded into the report. Every record produced is also consumed back through the same driver, so the producer and consumer columns track each other.

Environment

CPUIntel Xeon @ 2.10 GHz, 4 vCPU
RAM15 GiB
Crabkacrabka-broker v0.2.0 @ commit cdbb379, release build
KafkaApache Kafka 4.3.0 (KRaft combined mode), latest release
JVMOpenJDK 21.0.10, default -Xmx1G -Xms1G heap

Both the broker and the load driver share the same 4-vCPU box, so the absolute throughput figures are laptop-class, not datacenter numbers, and each cell is a single measurement run. The Crabka-vs-Kafka comparison is apples-to-apples: identical load, identical host, identical driver, brokers run one at a time. Crabka's broker is pinned to RUST_LOG=warn so per-request logging doesn't inflate its CPU.

Produce-and-consume round-trip

Every record produced is also consumed back through the same driver, so the producer and consumer columns move together. Higher is better except for latency, memory, and startup.

scenario (1 broker, RF=1, 6 partitions)metricCrabkaKafka 4.3comparison
small-msg-saturate (100 B, acks=leader, 1P/1C)producer msgs/s5 5495 8920.94×
consumer msgs/s5 5495 8920.94×
p99 producer ack0.552 ms0.479 msKafka 1.15× lower
p99 consumer e2e0.873 ms0.812 msKafka 1.08× lower
msgs/s per CPU-core4 1143 5281.17×
broker peak RSS24 MiB1 027 MiB43× lighter
local-1kb-saturate (1 KiB, acks=leader, 2P/2C)producer msgs/s11 25310 9241.03×
consumer msgs/s11 25310 9241.03×
p99 producer ack0.367 ms0.448 ms1.22× lower
p99 consumer e2e0.639 ms0.785 ms1.23× lower
msgs/s per CPU-core6 6315 6701.17×
broker peak RSS32 MiB1 040 MiB32× lighter
fixed-rate-latency (1 KiB, acks=all + idempotence, 1P/1C)producer msgs/s4 2894 2321.01×
consumer msgs/s4 2894 2321.01×
p99 producer ack0.441 ms0.477 ms1.08× lower
p99 consumer e2e0.565 ms0.622 ms1.10× lower
msgs/s per CPU-core3 9433 3871.16×
broker peak RSS32 MiB1 039 MiB32× lighter

On raw throughput the two stacks trade blows within a few percent: Crabka is ahead on both 1 KiB workloads (1.01–1.03×) and a touch behind on the 100 B saturation run (0.94×). Where Crabka pulls clearly ahead is everything around the throughput:

  • Memory: resident in 24–32 MiB versus Kafka's ~1 GiB — 32–43× lighter. The JVM heap is fixed at -Xms1G, but even the live working set dwarfs Crabka's.
  • CPU efficiency: 1.16–1.17× more messages per CPU-core in every scenario, so equal-or-better throughput is delivered for noticeably less CPU.
  • Tail latency: comparable at p99 on the small-message run and tighter on both 1 KiB runs; Crabka's p99.9 and max are consistently lower — e.g. on local-1kb-saturate, producer p99.9 0.933 ms vs 1.717 ms and max 11.2 ms vs 37.8 ms; on fixed-rate-latency, max 19.2 ms vs 42.5 ms.
  • Startup: ready in 1–2 s versus Kafka's 8–9 s, and first ack lands sooner.

The "saturate" scenarios are latency-bound rather than bandwidth-bound: the driver awaits each send's ack before issuing the next per producer task, so a single task tops out around its round-trip rate. Both stacks are driven identically, so the ratio is still meaningful — it just isn't a raw MB/s ceiling.

How low can the JVM go?

The memory gap above is measured against Kafka's default -Xms1G -Xmx1G — is that just an unfairly fat default? To check, we reran the 1 KiB saturation workload against Kafka with progressively smaller heaps (-Xmx = -Xms), same box, same driver. Crabka holds this workload in ~32 MiB.

Kafka heapboots?producer msgs/sp99 ackp99.9 ackbroker RSSverdict
1024 MiB (default)12 9880.287 ms1.17 ms1 011 MiBcompetitive
512 MiB13 7580.258 ms1.14 ms694 MiBcompetitive
256 MiB13 6450.272 ms1.59 ms463 MiBcompetitive
224 MiB13 5950.265 ms2.14 ms422 MiBcompetitive, tail fraying
192 MiB12 7900.322 ms3.19 ms397 MiBruns, tail degraded
≤ 160 MiBOOM at startup
  • Throughput survives down to a ~192 MiB heap (~397 MiB RSS): Kafka's footprint isn't all default-heap fat — you can shrink the heap dramatically with little throughput loss.
  • Latency degrades before throughput does. From 224 MiB down, G1 pauses push the worst-case ack out (p99.9 climbs from ~1 ms to ~3 ms) while broker CPU rises on identical work — that extra CPU is GC.
  • The hard floor is ~176–192 MiB just to boot. At ≤160 MiB the KRaft broker dies during startup with java.lang.OutOfMemoryError: Java heap space in LogManager / MetadataLoader; it never serves a request.
  • Even squeezed to its minimum, the JVM is ~12× Crabka. The minimum viable heap (~192 MiB) still resides in ~397 MiB, because RSS also carries the JVM's non-heap floor — metaspace, code cache, thread stacks, direct buffers — which alone exceeds Crabka's entire 32 MiB process. The gap is structural, not a tuning default.

Methodology notes

  • Crabka's broker is pinned to RUST_LOG=warn so per-request logging doesn't inflate its CPU.
  • Broker CPU is the user+system delta over the measured window; memory is the process peak RSS (VmHWM).
  • A freshly-started broker returns transient COORDINATOR_LOAD_IN_PROGRESS / NOT_COORDINATOR until its coordinators load. The harness warms them symmetrically on both stacks with the JDK clients before measuring, so the measured window reflects broker steady state regardless of client retry behavior.

Interop with the JVM

The load driver is built on Crabka's own crabka-client-producer / crabka-client-consumer crates and runs unmodified against either broker. Its consumer decodes Kafka 4.3's Fetch response and drains every produced record from the JVM broker; the idempotent / acks=all producer completes cleanly against Kafka with zero producer errors; and the consumer locates the coordinator and joins the group on every stack without manual intervention. Beyond the driver, Crabka is validated against the JVM via differential byte-equality tests on every encode/decode and a JVM acceptance suite that drives the official kafka-*.sh admin tools against a live Crabka broker.

Reproduce

cargo build --release -p crabka-cli -p crabka-broker -p crabka-bench-driver
# unpack Apache Kafka 4.3.0, then:
KAFKA_HOME=/path/to/kafka_2.13-4.3.0 bench/local/run-local-bench.sh
# results + SUMMARY.md land in bench/local/results/