Benchmarks¶
Kubex is dramatically faster than kubernetes-asyncio, the most widely used async Kubernetes client for Python.
Summary¶
Benchmarks run against a K3s 1.35.4 cluster (K3s testcontainer, same hardware, same server):
| Scenario | kubernetes-asyncio | kubex (aiohttp) | kubex (httpx) | Speedup |
|---|---|---|---|---|
| Single GET | 61 ms | 6 ms | 26 ms | 10× |
| List 100 pods | 2,813 ms | 73 ms | 102 ms | 38× |
| List 500 pods | 14,441 ms | 351 ms | 410 ms | 41× |
| Watch 50 events | 3,957 ms | 562 ms | 1,764 ms | 7× |
Kubex also uses ~47% less heap memory and makes up to ~5× fewer allocations, reducing GC pressure in long-running controllers and operators.
Detailed results¶
Results below are from benchmarks/report.md in the repository. All numbers use a K3s 1.35.4 testcontainer. See the caveats section for measurement details.
Single GET¶
Single pod GET against a pre-seeded namespace.
| metric | k8s-asyncio | kubex-aiohttp | kubex-httpx | kubex-httpx-trio |
|---|---|---|---|---|
| wall p50 (ms) | 60.8 | 5.7 | 25.6 | 27.0 |
| wall p95 (ms) | 66.9 | 7.4 | 26.9 | 27.7 |
| steady heap (MB) | 55.4 | 27.9 | 31.7 | 30.2 |
| total allocations | 15,517,716 | 4,152,111 | 3,226,073 | 3,268,915 |
List 100 pods¶
List ~100 pods in bench namespace.
| metric | k8s-asyncio | kubex-aiohttp | kubex-httpx | kubex-httpx-trio |
|---|---|---|---|---|
| wall p50 (ms) | 2,813 | 73 | 102 | 100 |
| wall p95 (ms) | 2,920 | 79 | 107 | 109 |
| steady heap (MB) | 52.1 | 27.5 | 31.7 | 30.2 |
| total allocations | 7,934,267 | 3,619,184 | 3,936,894 | 3,870,615 |
List 500 pods¶
List ~500 pods in bench namespace.
| metric | k8s-asyncio | kubex-aiohttp | kubex-httpx | kubex-httpx-trio |
|---|---|---|---|---|
| wall p50 (ms) | 14,441 | 351 | 410 | 390 |
| wall p95 (ms) | 14,574 | 618 | 533 | 456 |
| steady heap (MB) | 52.2 | 27.6 | 31.8 | 30.3 |
| total allocations | 29,948,177 | 6,506,349 | 6,940,526 | 6,893,850 |
Watch 50 events¶
Receive N watch events driven by a sibling create/delete burst.
| metric | k8s-asyncio | kubex-aiohttp | kubex-httpx | kubex-httpx-trio |
|---|---|---|---|---|
| wall p50 (ms) | 3,957 | 562 | 1,764 | 1,863 |
| evt p50 (µs) | 24,069 | 1,611 | 3,581 | 4,729 |
| evt p99 (µs) | 56,655 | 9,163 | 5,855 | 15,563 |
| steady heap (MB) | 52.2 | 27.6 | 31.7 | 30.2 |
| total allocations | 4,977,137 | 3,472,840 | 4,685,643 | 4,714,213 |
PartialObjectMetadata (asymmetric)¶
Kubex metadata-only list vs kubernetes-asyncio full list. These are asymmetric — kubernetes-asyncio has no metadata-only equivalent, so its numbers reflect a full object list for contrast. The metadata adapter uses the aiohttp backend so the comparison isolates the
?as=PartialObjectMetadatasaving from any HTTP-stack speed difference.
| metric | k8s-asyncio (full list, 100 pods) | kubex-metadata-aiohttp |
|---|---|---|
| wall p50 (ms) | 2,813 | 14.0 |
| wall p95 (ms) | 3,274 | 15.6 |
| steady heap (MB) | 52.1 | 27.6 |
| total allocations | 7,934,232 | 3,061,371 |
Why the gap is so large¶
kubernetes-asyncio deserializes every response into Python dicts, validates fields with hand-written code, and constructs V1* objects via keyword arguments — an extremely allocation-heavy path. Kubex uses Pydantic v2's Rust-backed validator, which parses JSON directly into typed Python objects in a single pass with far fewer intermediate allocations.
The list scenario magnifies this: deserializing 500 pods iterates the kubernetes-asyncio path 500 times. Kubex parses the entire list response in one Pydantic call.
Reproducing the benchmarks¶
Requirements: Docker (for the K3s testcontainer).
# Install the benchmark dependency group
uv sync --group benchmark --python 3.13
# Run the full suite (starts K3s, seeds pods, measures)
uv run --group benchmark python -m benchmarks.run \
--report benchmarks/report.md \
--csv benchmarks/report.csv
Run only specific adapters or scenarios:
uv run --group benchmark python -m benchmarks.run \
--adapters kubex-aiohttp-asyncio k8s-asyncio \
--scenarios single_get list_large \
--report benchmarks/report.md
Skip memory instrumentation for faster CPU-focused numbers:
uv run --group benchmark python -m benchmarks.run \
--no-memory --cpu-profile --report benchmarks/report.md
Measurement methodology¶
- Each
(adapter, scenario)runs in a fresh subprocess so library imports never mix (the two libraries have very different import-time footprints). - Warm-up: 3 untimed iterations (1 for streaming scenarios), then 10 measured (3 for streaming).
- Wall-time:
time.perf_counter_nsper iteration; reported as p50 / p95 / p99. - Memory:
memray.Trackerwraps the measured loop; provides peak RSS, total bytes allocated, and allocation count. - Steady heap:
gc.collect()+tracemalloc.get_traced_memory()after the loop. - CPU:
time.process_time()delta over the measured loop (always captured, cheap).
Caveats¶
- Memory numbers are Linux-only — peak RSS accounting differs on macOS.
--cpu-profile(pyinstrument) slightly inflates wall-time. For clean wall-time, omit it.- Asymmetric scenarios (
list_metadata_only,single_get_metadata) compare non-equivalent code paths — kubernetes-asyncio rows on those scenarios show a full list/get for reference only. - K3s boot takes ~20s per session; this cost is paid once per
python -m benchmarks.runinvocation. - For mutation scenarios (
single_create_delete,watch_n_events), the K3s API server is the common bottleneck — differences between adapters appear mainly in CPU + allocations, not wall-time.