Errata
Errata
Corrections applied to the published posts after simulator fixes. This page documents what changed between the original runs and the current data backing each post. The qualitative conclusions are intact; the quantitative knees moved to slightly more pessimistic values.
The corrections below were consolidated on 2026-04-17 and rolled into both posts in the 2026-06-09 correction.
Scope
- Simulating Catalog and Table Conflicts (published 2026-03-09) — exp1, exp2, exp3a, exp3b.
- Simulating Multi-Table Contention in Catalog Formats (published 2026-03-23) — exp4a, exp4b (including Zipfian variants), and exp4c.
Simulator fixes that changed the numbers
Fixes landed in Endive between publication and the post-fix re-runs. Each one has an identifiable, directional effect on the curves.
Per-attempt I/O cost model
The biggest source of change. The per-attempt cost grew from 3 S3 operations (buggy) to 5 S3 operations (correct, non-inlined):
| Commit | Date | Change | Net direction |
|---|---|---|---|
9d1d8e1 | 2026-04-08 | Remove the manifest-file PUT from the per-attempt cost — that write was part of transaction runtime, not the commit protocol. | faster commits |
5c8aa30 | 2026-04-08 | Add the table_metadata read+write pair to the per-attempt cost. The TM was missing from the model entirely; every FA/VO attempt was one full RTT too cheap. | slower commits (dominant effect) |
ec383ff | 2026-04-08 | Failure-path TM read for overlap detection; free retry for cross-table/disjoint-partition conflicts (no manifest I/O); CAS size cap for inlined metadata. | neutral for single-table; lowers latency for multi-table cross-table contention |
Net change at S3 median (non-inlined, per attempt):
old (buggy): MF_write(100K) + ML_read + ML_write ≈ 43 + 27 + 60 = 130 ms
new: TM_read + ML_read + ML_write + TM_write + CAS ≈ 27 + 27 + 60 + 60 + 1 = 175 ms
The published blog ran on the buggy ~130ms-per-attempt model. Its observed throughput was artificially inflated by double-counting a manifest-file write that the protocol doesn’t require, but also artificially deflated by skipping the table-metadata pair. The net effect of the fix is slower commits overall: the per-table ceiling drops from ~7.7 c/s to ~5.7 c/s at S3 medians.
Timing / information leaks
| Commit | Date | Change |
|---|---|---|
edbe6da | 2026-03-20 | CAS version check now evaluates at half-RTT (server-side), not at full RTT. Removes a physically-impossible fast path where a client’s CAS could succeed against a catalog state that hadn’t yet propagated back to it. |
a6e908e | 2026-04-09 | catalog.read() also split-yields: the snapshot returned is what the catalog held at half-RTT, not at the client’s wall-clock time. Symmetric to edbe6da. |
These two fixes are why the saturation ceiling now sits on the 1/(5L) theoretical bound (5.7 c/s observed vs. 5.71 c/s predicted) rather than above it. Before these fixes, the simulator could produce commits faster than message delays physically allow.
VO IO-convoy accounting
| Commit | Date | Change |
|---|---|---|
47cbb11 | 2026-03-19 | Convoy reads N−1 historical manifest lists where N is the table version delta, not the global catalog sequence delta. |
092e489 | 2026-03-19 | Deduplicate the per-attempt ML read from the convoy’s N ML reads (was charged twice per attempt). |
fa51753 | 2026-04-08 | Convoy decomposed per-table: Σ_table (V_table − 1) · M_table instead of a global V_global · M. |
Effect is strongest on multi-table VO tails (exp4b). On a single table (exp2 FA=0.0), the 47cbb11 N−1 correction and the 092e489 de-dup lower pure-VO P99 modestly (219→203 s at IA=200 ms); the fa51753 per-table decomposition is a no-op with only one table. The separate per-attempt TM pair raises per-attempt cost, but that effect is accounted for under Per-attempt I/O cost above, not here.
Config drift (pre-run correction, 2026-04-13)
The blog’s exp1–4 configs had table_metadata_inlined = true set, which was a no-op until commit f1ad9ef (2026-03-26) made the flag actually do something. From that point forward, every re-run silently used inlined metadata (1/(3L) bound, ~11.4 c/s ceiling). Fixed pre-run by d55d3ce (2026-04-13), which flips the flag to false across exp1–4 templates. The post-fix runs use non-inlined metadata, matching the blog’s original intent.
The drift was undetected for ~18 days because expctl’s staleness check compared stored cfg.toml to the directory hash and simulator code, not to the template. Addressed by template-hash stamping (0e961fd, 5351c79).
Simulator is strictly serial
The parallel-I/O footnote in the 2026-03-09 post (“up to 4 I/O operations can run in parallel”) was inaccurate: the simulator issues all storage I/O serially. The max_parallel = 4 knob in the configs was never consumed by endive/. The footnote and surrounding sentence have been removed.
Quantitative deltas
Simulating Catalog and Table Conflicts (2026-03-09)
| Metric | Published | Corrected |
|---|---|---|
| FA single-table throughput ceiling | ~7.7 c/s (19% success @ 7.8 c/s) | 5.7 c/s (14% success) |
| FA practical ceiling (99%+ success) | ~2.7 c/s | 2.0 c/s |
| FA P50 at low load | 320 ms | 410 ms |
| FA P99 at saturation | 1.89 s | 2.58 s |
| VO P99 at IA=200 ms (pure VO) | 219 s | 203 s (convoy fixes) |
| FA success @ 50 ms IA | 42% | 33% |
The “3–4 commits/sec” language in the original post has been changed to “2–3 commits/sec”. The per-attempt cost explanation was rewritten from “~300 ms retry” to “five S3 round-trips per attempt, ~175 ms at S3 median latencies”.
Simulating Multi-Table Contention (2026-03-23)
Single-table knees at >95% VO success:
| Provider | Published FA-only | Corrected FA-only | Published 90/10 | Corrected 90/10 |
|---|---|---|---|---|
| S3 Express | 14.6 c/s | 14.6 c/s | 7.5 c/s | 7.4 c/s |
| S3 Standard | 2.4 c/s | 1.8 c/s | 1.8 c/s | 1.8 c/s |
| Azure Premium | 2.5 c/s | 2.4 c/s | 1.9 c/s | 1.8 c/s |
| Azure Standard | 2.4 c/s | 1.8 c/s | 1.5 c/s | 1.5 c/s |
| GCP | 0.7 c/s | 0.4 c/s | 0.4 c/s | 0.4 c/s |
The biggest structural change is the multi-table FA-only knee for S3 Standard and Azure Premium at 5–20 tables:
| Provider | Tables | Published | Corrected | Δ |
|---|---|---|---|---|
| S3 Standard | 5–20 | 7.2–7.4 c/s | 3.7 c/s | −50% |
| S3 Standard | 50 | 7.4 c/s | 7.2 c/s | −3% |
| Azure Premium | 5–20 | 7.2–7.4 c/s | 3.7 c/s | −50% |
| Azure Premium | 50 | 7.4 c/s | 3.7 c/s | −50% |
| Azure Standard | 10–50 | 3.7 c/s | 3.7 c/s | — |
| GCP | 50 | 3.6 c/s | 0.7 c/s | −81% |
| S3 Express | 50 | 14.9 c/s | 36.0 c/s | +142% (free-retry for cross-table CAS failures) |
For S3/Azure at 5–20 tables the published numbers were catalog-CAS-bound at ~7.4 c/s; with the non-inlined per-table ceiling now at 5.7 c/s, the per-table bound binds first and the knee flattens to 3.7 c/s. Only at 50 tables does catalog CAS again become the bottleneck (for S3 Standard FA-only; Azure Premium sits at the per-table bound through 50 tables).
S3 Express at 50 tables jumps to 36 c/s because its ~10 ms per-op latency keeps the per-table bound very high; with the ec383ff free-retry fix, cross-table CAS failures no longer charge manifest I/O, so the catalog handles more commits.
GCP’s multi-table scaling is weaker than presented: 0.4 c/s (1 table) → 0.7 c/s (50 tables) FA-only, versus the published 0.7 c/s → 3.6 c/s. The per-op latency is high enough that the per-table bound binds through 50 tables.
What survives unchanged
- Every qualitative conclusion in both posts.
- “Sustained commit rates above 1–2 commits/sec are unattainable” — now exactly right rather than hedged.
- “Storage I/O is the primary bottleneck for single-table workloads.”
- “Catalog CAS latency up to 120 ms adds only modest overhead for single-table.”
- “IO convoys serialize VO commit attempts; P99 reaches minutes at moderate rates.”
- “More tables move the bottleneck from per-table metadata I/O to catalog contention” (qualitatively true; the crossover is now at ~50 tables for S3/Azure instead of ~5).
- “Zipf concentration limits the benefit of adding tables; rank-1 table dominates.”
- “S3 Express is in a different class for catalog-as-file workloads.”
- “GCS is not viable for catalog-as-file workloads.”
- The Iceberg write-path diagram and metadata-size assumptions (~1 MiB TM, ~100 KiB manifest list).
- The per-provider latency distributions (S3, S3 Express, Azure, Azure Premium, GCS).
Full companion reports
EXP1-3_REPORT.md— post-fix validation and cell-by-cell comparison for exp1, exp2, exp3a, exp3b.EXP4_REPORT.md— post-fix validation for exp4a, exp4b (uniform and Zipf), and exp4c across five storage providers.
