Simulating Multi-Table Contention in Catalog Formats

22 minute read

Published:

tl;dr

[Part 1] [Part 2]

Table formats like Apache Iceberg were designed before conditional operations were widely available in object stores. These operations are sufficient to support Iceberg’s linearizable table update protocol, but how would they perform? Simulating multi-table commit contention at the catalog suggests:

  1. Partitions help throughput and VO tail latency. Distributing a uniform workload across 20+ tables improves aggregate throughput by 2-10x for slower providers and compresses VO P99 by 6-17x. With zipfian skew, the most popular table converges to single-table performance, but other tables are mostly unaffected by the hot table’s contention.

  2. Provider choice is a larger lever than table count. S3 Express One Zone (S3x) sustains 14.6 c/s on a single table- more per-table throughput than spreading a workload across 50 tables on S3 Standard, even with an “instant” catalog. The entire commit pipeline (CAS + manifest I/O) compresses with faster storage; adding tables only helps with catalog contention.

  3. Longer-tailed distributions compound under contention. Each attempt requires multiple reads/writes in the object store. Variability extends the hazard window and makes workloads less stable overall. For example, S3 and Azure Premium have similar median CAS latency (61, 64ms), but Azure Standard’s longer tails result in more failures as it approaches saturation.

  4. GCS is not viable for catalog-as-file workloads. This follows from raw GCS CAS latency. Commit success degrades above ~0.7 commits/sec; with 10% ValidatedOverwrite transactions, the sustainable rate drops to 0.4 c/s.

Usable throughput (>95% success) vs table count for all providers at FA/VO ratios of 100/0, 90/10, and 50/50. S3x is flat at ~14.9 c/s across all table counts and mixes. S3 and Azure Premium plateau at ~7.4 c/s by 10 tables. Azure Standard reaches ~3.7 c/s. GCP climbs slowly to ~3.6 c/s at 50 tables. Adding VO transactions drops single-table throughput for all providers but is recovered with 2+ tables for S3x and 10+ tables for others.
Simulated provider performance distributed over 1 to 50 tables (uniform). These rates are 20-50% of measured CAS saturation for these providers, due to commit protocol overhead.

The commit protocol bottleneck is well-known among table format developers; lifting it whole or in part into a dedicated service is a popular solution. Now that we’ve measured and characterized the protocol, we can explore those tradeoffs in a later post.

Commit Contention in Catalog Files

Previously we simulated single-table commit rates. Now we add another dimension: multiple tables in the catalog. These are not multi-table transactions, but rather independent table updates that physically conflict at the catalog. For example, if transactions T1 and T2 update tables A and B respectively, T1 successfully updating the catalog reference for A could cause T2 to fail its commit to B. However, repairing T2’s commit is cheaper than what we measured last time: T2 only needs to retry at the catalog, not rewrite the table metadata or its manifest list.

This models the “catalog as file” case where the entire catalog is conditionally replaced on every commit. Note that as the number of tables increases, the inter-arrival time is distributed across all the tables in the catalog; be careful not to read it as the arrival rate for a single table, which we measured before.

Experiment Summary

The workload mix is the same as in the single-table experiments, composed of “light” FastAppend (FA) and “heavy” validated overwrite (VO) transactions. The salient difference between FA and VO is the I/O necessary to retry a transaction: a FA transaction needs to re-read only the latest manifest list while a VO transaction needs to read the manifest lists of all new snapshots of that table.

The workload is steady, but optimistic: it assumes no transaction needs to read beyond the manifest list to investigate or repair a conflict before retrying. Real workloads include commit attempts that do more work between attempts, increasing the chance of conflict.

Tables are selected from either uniform or Zipfian distributions, as annotated.

ExpDescriptionFixedSweptConfigs
4aMulti-table contention (FA)1 group, FA=100%, S3, conflicts=0%num_tables
catalog_latency_ms
inter_arrival_scale
240
4bMulti-table contention (mix)1 group, FA=90%/VO=10%, S3, conflicts=0%num_tables
catalog_latency_ms
inter_arrival_scale
240
4cMulti-table, real providers1 group, conflicts=0%, backend=storageprovider
num_tables
fast_append_ratio
inter_arrival_scale
900
ParameterValuesDescription
inter_arrival_scale[20, 50, 100, 200, 300, 400, 500, 1000, 2000, 5000] msScale parameter for the exponential distribution of transaction inter-arrival times. Lower values correspond to higher transaction rates.
fast_append_ratio[1.0, 0.9, 0.8, 0.7, 0.5, 0.3, 0.1, 0.0]Ratio of FastAppend (light) transactions to ValidatedOverwrite (heavy) transactions in the workload mix. 1.0 means all transactions are FastAppend, while 0.0 means all transactions are ValidatedOverwrite.
catalog_latency_ms[ 1, 10, 50, 120]Latency of the catalog’s compare-and-set (CAS) operation in milliseconds. This models the time it takes for a transaction to attempt a commit and receive a response from the catalog.
num_tables[1, 2, 5, 10, 20, 50]Number of tables in the catalog. This models the contention at the catalog when multiple tables are being updated concurrently.
provider[s3x, s3, azurex, azure, gcp]Cloud storage provider used for the catalog. Each provider has different CAS latency distributions, which affect the commit success rates and latencies.

In all experiments, the manifest list and table metadata sizes are fixed (10KiB and 100KiB, respectively). Manifest and metadata I/O uses unconditional GET and PUT operations, not the conditional operations measured earlier. We use the same S3 Standard latencies for experiments 4a/4b as we used in the single-table experiments. We use provider distributions for experiment 4c (i.e., unconditional reads/writes for metadata, conditional writes for the catalog).

Latency distributions for S3 Standard (click to expand)

Distribution Parameters

GET (unconditional read)

Modeled as Lognormal(mu=ln(median), sigma), floored at min_latency_ms.

Operationmedian (ms)sigmamin_latency (ms)
GET270.6210

GET operations don’t include sizes because latency is dominated by fixed overheads at these sizes.

PUT (unconditional write)

Modeled as Lognormal(mu=ln(base + rate * size_MiB), sigma), floored at min_latency_ms.

Operationbase (ms)rate (ms/MiB)sigmamin_latency (ms)
PUT60200.2910

Percentiles

Operationp5p10p25p50p75p90p95p99
GET10121827416075114
PUT37425060738797118

Multi-table scaling (4a, 4b)

Before measuring real providers, we sweep the number of tables (1-50) and catalog CAS latency (1-120ms) to establish how much multi-table scaling can buy. Experiments 4a (FA-only) and 4b (90/10 FA/VO) use simulated S3 Standard latencies for manifest I/O, with synthetic CAS latencies. Full heatmaps are in Appendix A.

Uniform distribution: tables move the bottleneck to the catalog

With a fast catalog (1-10ms CAS), distributing FA transactions uniformly across 20+ tables is sufficient to eliminate per-table metadata contention- even at 20ms inter-arrival, success rates exceed 99%. At 10 tables, success is still 94-98% depending on CAS latency.

Exp 4a: Heatmap of FA success rate by number of tables (1-50) and inter-arrival scale, 10ms CAS. More tables dramatically improve success: 50 tables achieve 99.5% even at 20ms inter-arrival vs 18.7% with 1 table. 10+ tables reach 100% at 50ms+ inter-arrival. Exp 4a: Heatmap of mean commit latency by table count and inter-arrival scale, 10ms CAS. Latency drops sharply with more tables: 50 tables at 20ms is 422ms vs 1089ms for 1 table. Baseline converges to ~350ms at 5000ms inter-arrival. Hatched cells indicate low success rates.
Exp 4a: FA success rate and latency by table count and inter-arrival time (10ms, 1ms CAS). More tables shift the bottleneck from per-table metadata I/O to the catalog.

At CAS latencies closer to real providers (50-120ms), the frontier tightens. At 50ms CAS, the knee (>95% success rate) falls to ~6 c/s even with 50 tables. At 120ms, the ceiling drops to ~3 c/s- the catalog round-trip dominates retry cost at every table count.

Exp 4a: Heatmap of FA success rate by table count and inter-arrival scale, 50ms CAS. Worse than 10ms: 1 table at 20ms is 14.6%, 50 tables at 20ms is only 47.7%. Even 20-50 tables need 100ms+ inter-arrival for 99%+ success. Exp 4a: Heatmap of mean commit latency by table count and inter-arrival scale, 50ms CAS. Higher baseline than 10ms: 1 table at 5000ms is 481ms vs 354ms. At 20ms, 1 table reaches 1565ms and 50 tables 984ms. Nearly all low-arrival cells are hatched.
Exp 4a: FA success rate and latency by table count (50ms, 120ms CAS). At realistic CAS latencies, table count provides less relief.

Adding VO: table partitioning reduces per-table retry cost

Adding 10% VO transactions barely changes FA success rates. VO success improves dramatically with table count. Each VO retry reads a manifest list for each snapshot committed to that table since the read snapshot; with more tables, each table sees fewer commits, reducing the per-table retry cost. At 10ms CAS with 50 tables, VO reaches 99.2% success even at 20ms inter-arrival; at 50ms CAS, VO and FA success rates converge above 10 tables. At 120ms CAS, the catalog limits both FA and VO equally.

Exp 4b: Heatmap of FA success rate (90/10 FA/VO mix) by table count and inter-arrival scale, 10ms CAS. Similar to exp4a FA-only: 50 tables at 20ms is 99.5%, 1 table at 20ms is 20.5%. VO presence barely affects FA success. Exp 4b: Heatmap of VO success rate (90/10 FA/VO mix) by table count and inter-arrival scale, 10ms CAS. VO benefits dramatically from table partitioning: 1 table at 20ms is 0.2%, but 50 tables at 20ms reaches 99.2%. 10 tables at 20ms is 68.1%; 5 tables at 100ms is 99.7%. VO converges to FA success rates with enough tables.
Exp 4b: FA and VO success rates and VO latency (90/10 mix, 10ms and 120ms CAS). FA success is nearly identical to 4a; VO success improves dramatically with table count.

Under a Zipfian (α = 1.5) distribution, the probability of selecting the kth-ranked table is proportional to 1/k1.5. The rank-1 table absorbs ~50% of writes regardless of how many tables exist; rank-2 gets ~18%, rank-3 ~10%, and the distribution falls off steeply. The effective table count tops out at ~4.5 even with 50 physical tables.

Exp 4a: Zipfian FA success rate by table count and inter-arrival, 50ms CAS. Much worse than uniform: 50 tables at 20ms is only 43.4% (vs 47.7% uniform). 10 tables at 100ms is 88.4%. Adding tables beyond 10 barely helps- Zipf 50 tables approximates uniform 5 tables. Exp 4a: Conflict type breakdown by table rank at 50 tables, ias=100ms, Zipf 50ms CAS. Rank-1 table dominates with ~50% of writes and mostly same-table (tblptn) conflicts. Cold tables (rank 10+) have more catalog conflicts than table conflicts.
Exp 4a: Zipfian table selection FA-only (50ms CAS). The rank-1 table dominates, collapsing the benefit of additional tables.

The rank-1 table behaves approximately like a single table at half the global arrival rate, with a small penalty from catalog conflicts. At low load, rank-1 success rates and latencies converge to the single-table baseline; at high load, catalog conflicts from other tables’ writes consume part of the retry budget, degrading success rates below the single-table equivalent.

Under Zipf, 70% of retries are same-table conflicts (requiring manifest I/O), compared to ~2% under uniform distribution with 50 tables. Adding physical tables beyond 10 barely helps- Zipf with 50 tables performs like uniform with ~5 tables.1 This is unsurprising, given that additional tables shift diminishing fractions of the workload.

Exp 4b: Zipfian VO success rate (90/10 mix) by table count and inter-arrival, 50ms CAS. VO benefits from table partitioning but less than uniform: 50 tables at 100ms is 80.1%, 50 tables at 20ms is 34.2%. 1 table at 100ms is 14.9%. The hot table concentrates per-table conflicts, limiting the benefit of additional tables. Exp 4b: Conflict type breakdown by table rank at 50 tables, ias=100ms, Zipf 50ms CAS, 90/10 mix. Rank-1 table dominates with ~4.4 FA table/partition conflicts and ~1.4 VO table/partition conflicts per transaction. Catalog conflicts (~1.6) are uniform across all ranks. Cold tables (rank 10+) have mostly catalog conflicts.
Exp 4b: Zipfian table selection 90/10 FA/VO (50ms CAS). The rank-1 table dominates, collapsing the benefit of additional tables.

Adding back VO transactions to the zipfian distribution, we see a similar effect: the most popular table converges to single-table performance, VO transactions are more sensitive (particularly at high load) to catalog conflicts, and sustainable single-table throughput with VO transactions is much lower. Catalog conflicts are evenly distributed across tables, but the most popular table also accumulates per-table conflicts where VO transactions struggle to complete at high load.

Takeaway: Uniform distribution shows partitioning is effective, until the CAS latency becomes the bottleneck. When the distribution is skewed (zipfian), retries from popular tables have minimal impact on other tables. These results also suggest that catalog and table conflicts should be handled separately by the retry policy. While these simulations assume a steady arrival rate following a distribution, many real workloads burst in a particular table. Exponential backoff for table conflicts and immediate retry for catalog conflicts would be more effective for that workload.

4c. Multiple tables, varied workload ratio, measured CAS distributions

Real provider CAS latencies (22-170ms) fall well above the 1-10ms sweet spot from 4a/4b, so most workloads will operate in the regime where CAS latency limits throughput. Now we substitute the CAS latencies measured for each provider and published unconditional read/write latencies. The full results are in Appendix B.

We’re assigning labels to these distributions matching observations from each provider, but this is still a model of the commit protocol. We’re interested less in hitting the moving target of real provider performance and more in learning from the model: when does CAS latency become the bottleneck? (When) does storage variability (modeled as a lognormal distribution) impact commit success rates?

The synthetic parameter sweeps in 4a/4b varied CAS latency and workload to measure commit success rates/latency holding the provider (S3) constant. Now we want to see how different provider profiles interact with workload mixes and table counts.

Latency distributions for storage providers (click to expand)

Distribution Parameters

GET (unconditional read)

Modeled as Lognormal(mu=ln(median), sigma), floored at min_latency_ms.

Providermedian (ms)sigmamin_latency (ms)
S3 Express2.50.571
S3 Standard270.6210
Azure Premium350.0820
Azure Standard380.6620
GCS2000.3080

PUT (unconditional write)

Modeled as Lognormal(mu=ln(base + rate * size_MiB), sigma), floored at min_latency_ms.

Providerbase (ms)rate (ms/MiB)sigmamin_latency (ms)
S3 Express6.5100.241
S3 Standard60200.2910
Azure Premium41150.1020
Azure Standard45250.5020
GCS200170.3080

GET Percentiles

Providerp5p10p25p50p75p90p95p99
S3 Express11224569
S3 Standard10121827416075114
Azure Premium3132333537394042
Azure Standard202024385989113176
GCS122136163200245294328402

PUT Percentiles

Providerp5p10p25p50p75p90p95p99
S3 Express4567891012
S3 Standard37425060738797118
Azure Premium3536394144474952
Azure Standard202432456386103145
GCS122136164200245294328402

Provider summary

ProviderCAS median (ms)CAS σRead base (ms)Read σWrite base (ms)Write σMin latency (ms)
S3 Express220.222.50.576.50.241
S3610.14270.62600.2910
Azure Premium640.73350.08410.1020
Azure930.82380.66450.5020
GCP1700.912000.302000.3080


Single-table Provider Performance

Provider100/0 (c/s)100/0 (FA/VO lat)90/10 (c/s)90/10 (FA/VO lat)50/50 (c/s)50/50 (FA/VO lat)
s3x14.60.15s / —7.50.10s / 6.6s7.40.10s / 6.7s
s32.40.84s / —1.80.68s / 19.7s1.80.67s / 19.6s
azurex2.50.72s / —1.90.60s / 21.2s1.80.59s / 20.9s
azure2.41.18s / —1.50.87s / 23.3s1.80.95s / 28.7s
gcp0.73.99s / —0.42.63s / 26.9s0.42.58s / 29.2s
Provider throughput and mean FA/VO latencies for a single-table where over 95% of VO transactions succeed

In the commit path, I/O latency for metadata is the dominant factor across providers. S3 Express One Zone is in its own class on this workload, delivering 3-6x the throughput of the next tier (S3 Standard, Azure Premium, Azure Standard) and up to 20x the throughput of GCS. Its low latency and low variance for both reads and writes compress the entire commit pipeline.

To accommodate 10% VO transactions, even S3x requires a 2x reduction in throughput to keep success rates above 95%.

Exp 4c: Single-table 90/10 FA/VO provider metrics. Four panels show FA success rate, VO success rate, FA mean latency, and VO mean latency vs inter-arrival scale. S3x sustains high FA success to ~70ms inter-arrival; S3, Azure Premium, and Azure Standard degrade below 200ms; GCP degrades below 1000ms. VO success drops much earlier than FA for all providers.
Single-table provider performance in 90/10 FA/VO workloads

Multiple tables

With uniform table selection, partitioned workloads fall into four performance tiers:

s3x (14.9 c/s) » s3 / azurex (7.4) » azure (3.7) / gcp (3.6 simulated, ~0.8 actual)

These rates plateau for 10+ tables except for Azure Standard, which reaches 7.2 c/s at 20-50 tables. Distributing the same load across multiple tables improves throughput for all providers. In all cases save GCP, simulated commit throughput is well below the measured maximum conditional write throughput.

This suggests more throughput is available at higher arrival rates, if it is uniformly distributed over the same number of tables, i.e., we could push throughput higher by tolerating more catalog conflicts.

Outliers: S3 Express One Zone (S3x) and GCP

S3x benefits immediately from partitioning. Its single-table throughput for mixed workloads dropped from 14.6 c/s to 7.5 c/s when VO transactions are added, but with 2 or more tables it sustains 14.9 c/s for all workload mixes. It also has the lowest mean latencies, often 2-6x lower than the next tier.

The simulated GCP rate (3.6 c/s) is too high, because the server-side throttle is not modeled. GCS measured CAS throughput saturates at 0.8-1.4 op/s; the commit protocol adds overhead on top of that, so the real multi-table rate is at most ~0.8 c/s.

Main tier: S3 Standard, Azure Premium, Azure Standard

The other three stores are more interesting. S3 and Azure Premium have similar CAS medians (61 vs 64ms), but Azure Premium’s CAS sigma is 5x larger (0.73 vs 0.14). Azure is worse on both axes: higher median (93ms) and higher sigma (0.82).

ProviderCAS medianCAS σRead σWrite σ
S361ms0.140.620.29
Azure Premium64ms0.730.080.10
Azure93ms0.820.660.50
Provider lognormal distribution parameters

In this model, both Azure Premium and Standard have higher CAS variance, but Premium has very tight read/write variance; its I/O is predictable even if the CAS is noisy. Azure Standard has high variance everywhere: its tail latency prevents it from reaching 7.2-7.4 c/s at 20 tables, which S3 and Azure Premium sustain at 10+ tables in all workload mixes.

Provider100/0 (c/s)100/0 (FA/VO lat)90/10 (c/s)90/10 (FA/VO lat)50/50 (c/s)50/50 (FA/VO lat)
s3x14.90.11s / —14.90.11s / 1.5s14.90.11s / 1.4s
s37.40.75s / —7.30.75s / 8.2s7.30.74s / 8.1s
azurex7.40.75s / —7.40.75s / 8.8s7.30.75s / 8.7s
azure3.70.91s / —3.70.91s / 6.5s3.70.90s / 6.5s
gcp2.43.57s / —2.43.54s / 20.3s2.43.48s / 20.0s
Provider throughput and mean FA/VO latencies for 10 tables where over 95% of VO transactions succeed
Exp 4c: Ten-table 90/10 FA/VO provider metrics. Four panels show FA success rate, VO success rate, FA mean latency, and VO mean latency vs inter-arrival scale. VO success rates improve dramatically vs single-table: S3x, S3, and Azure Premium reach 95%+ at 100ms inter-arrival. VO mean latency drops significantly with table partitioning. GCP remains worst but also benefits.
Ten-table provider performance in 90/10 FA/VO workloads

Put another way: when Azure Standard retries take 5-10x the median, they’re almost certainly waste. In settings with high variance, commit protocols need to minimize how often a retry attempt samples from a fat-tailed distribution. Distributed over enough tables, Azure Standard can sustain similar throughput to Azure Premium and S3 Standard, albeit with higher mean latency.

Takeaway: S3x is in a different class for catalog-as-file workloads. Distributing load across tables is effective, but store variance can drive failure rates up even at low arrival rates.

Conclusion

We tested two levers for improving commit throughput under contention: adding tables and choosing a faster storage provider. Both help.

Provider choice matters more than table count. S3 Express sustains 14.6 c/s (FA-only) on a single table- more per-table throughput than distributing across 10+ tables on S3 Standard (7.4 c/s aggregate, ~0.74 c/s per table). Fast storage compresses the entire commit pipeline (CAS + manifest I/O), while adding tables only relieves catalog contention. For providers in the 2-3 c/s tier (S3, Azure Standard, Azure Premium), 20+ tables yield 1.5-3x aggregate scaling.

Table partitioning reduces VO retry cost. Each VO retry reads manifest lists proportional to snapshots committed to that table since the read snapshot. With more tables, each table sees fewer commits, and VO tail latency drops accordingly. At S3’s 20-table knee (7.4 c/s aggregate), P99 is 11.5s- down from 69.5s at the single-table knee. S3 Express drops to 2.1s at its 20-table knee. Under Zipfian skew, the hot table still converges to single-table performance, but catalog contention does not impact the less popular tables.

The protocol still matters at single-table scale. On a single table, VO P99 reaches tens of seconds at moderate throughput regardless of provider. S3 Express’s 35.8s single-table P99 is the lowest of any provider, though still impractical for most production workloads.

Take the simulated provider experiments with a grain of salt: the labels we’re putting on the storage distributions are drawn from measurements of real systems, but these parameters do not completely describe reality. These are optimistic models of provider performance. It is unlikely that real workloads could sustain these rates without external coordination.

One practical takeaway: catalog and table conflicts should be handled separately by the retry policy. Catalog conflicts are cheap to retry (re-read the catalog, re-apply the CAS) while table conflicts require re-reading manifest lists. Immediate retry for catalog conflicts and exponential backoff for table conflicts would better match the cost structure.

More broadly, these simulations may be sufficient to indict the commit protocol. Writing copy-on-write objects to storage on every commit attempt is a self-imposed obstacle to scaling commit throughput, more than writing to the same catalog object in every commit.

Appendix A: Full 4a/4b results

Full heatmaps for experiments 4a (FA-only) and 4b (90/10 FA/VO mix) across table counts, inter-arrival times, and catalog CAS latencies. Both uniform and Zipfian table selection distributions are included.

4a: FA-only, uniform (10ms, 1ms CAS)
Exp 4a: Heatmap of FA success rate by number of tables (1-50) and inter-arrival scale, 10ms CAS. More tables dramatically improve success: 50 tables achieve 99.5% even at 20ms inter-arrival vs 18.7% with 1 table. 10+ tables reach 100% at 50ms+ inter-arrival. Exp 4a: Heatmap of mean commit latency by table count and inter-arrival scale, 10ms CAS. Latency drops sharply with more tables: 50 tables at 20ms is 422ms vs 1089ms for 1 table. Baseline converges to ~350ms at 5000ms inter-arrival. Hatched cells indicate low success rates.
Exp 4a: FA success rate and latency by table count and inter-arrival time (10ms, 1ms CAS).
4a: FA-only, uniform (50ms, 120ms CAS)
Exp 4a: Heatmap of FA success rate by table count and inter-arrival scale, 50ms CAS. Worse than 10ms: 1 table at 20ms is 14.6%, 50 tables at 20ms is only 47.7%. Even 20-50 tables need 100ms+ inter-arrival for 99%+ success. Exp 4a: Heatmap of mean commit latency by table count and inter-arrival scale, 50ms CAS. Higher baseline than 10ms: 1 table at 5000ms is 481ms vs 354ms. At 20ms, 1 table reaches 1565ms and 50 tables 984ms. Nearly all low-arrival cells are hatched.
Exp 4a: FA success rate and latency by table count (50ms, 120ms CAS).
4a: FA-only, Zipfian (50ms CAS)
Exp 4a: Zipfian FA success rate by table count and inter-arrival, 50ms CAS. Much worse than uniform: 50 tables at 20ms is only 43.4% (vs 47.7% uniform). 10 tables at 100ms is 88.4%. Adding tables beyond 10 barely helps- Zipf 50 tables approximates uniform 5 tables. Exp 4a: Conflict type breakdown by table rank at 50 tables, ias=100ms, Zipf 50ms CAS. Rank-1 table dominates with ~50% of writes and mostly same-table (tblptn) conflicts. Cold tables (rank 10+) have more catalog conflicts than table conflicts.
Exp 4a: FA success rate and latency with Zipfian table selection (50ms CAS).
4b: 90/10 FA/VO mix, uniform (10ms, 1ms CAS)
Exp 4b: Heatmap of FA success rate (90/10 FA/VO mix) by table count and inter-arrival scale, 10ms CAS. Similar to exp4a FA-only: 50 tables at 20ms is 99.5%, 1 table at 20ms is 20.5%. VO presence barely affects FA success. Exp 4b: Heatmap of VO success rate (90/10 FA/VO mix) by table count and inter-arrival scale, 10ms CAS. VO benefits dramatically from table partitioning: 1 table at 20ms is 0.2%, but 50 tables at 20ms reaches 99.2%. 10 tables at 20ms is 68.1%; 5 tables at 100ms is 99.7%. VO converges to FA success rates with enough tables.
Exp 4b: FA and VO success rates (90/10 mix, 10ms and 1ms CAS).
4b: 90/10 FA/VO mix, uniform (50ms, 120ms CAS)
Exp 4b: Heatmap of FA success rate (90/10 mix) by table count and inter-arrival scale, 50ms CAS. 1 table at 20ms is 16.1%, 50 tables at 20ms is 47.9%. At 100ms, 50 tables reach 99.5%, 1 table is 61.7%. Exp 4b: Heatmap of VO success rate (90/10 mix) by table count and inter-arrival, 50ms CAS. VO improves substantially with table count: 50 tables at 100ms is 99.4%, 10 tables at 50ms is 80.6%. 1 table at 100ms is 14.6%, 50 tables at 20ms is 45.4%. FA and VO converge at high table counts.
Exp 4b: FA and VO success rates (90/10 mix, 50ms and 120ms CAS).
4b: 90/10 FA/VO mix, Zipfian (50ms CAS)
Exp 4b: Zipfian VO success rate (90/10 mix) by table count and inter-arrival, 50ms CAS. VO benefits from table partitioning but less than uniform: 50 tables at 100ms is 80.1%, 50 tables at 20ms is 34.2%. 1 table at 100ms is 14.9%. The hot table concentrates per-table conflicts, limiting the benefit of additional tables. Exp 4b: Conflict type breakdown by table rank at 50 tables, ias=100ms, Zipf 50ms CAS, 90/10 mix. Rank-1 table dominates with ~4.4 FA table/partition conflicts and ~1.4 VO table/partition conflicts per transaction. Catalog conflicts (~1.6) are uniform across all ranks. Cold tables (rank 10+) have mostly catalog conflicts.
Exp 4b: VO success rate and conflict type distribution with Zipfian table selection (50ms CAS).


Appendix B: Full 4c results

Galleries of success rate and latency heatmaps for all 5 storage providers, across all table counts, inter-arrival times, and workload mixes. Click on an image to view the gallery and flip through them.

S3
Exp 4c: S3 FA=100% success rate. 1 table at 20ms is 13.8%, 50 tables is 42.3%. Reaches 100% by 200ms for 10+ tables. Very similar profile to standard Azure. Exp 4c: S3 FA=90% FastAppend success rate. 1 table at 20ms is 15.2%, 50 tables is 42.5%. At 100ms, 50 tables reach 99.1%. Similar to Azure at high table counts.
Exp 4c: S3 Standard success rates. Heatmaps for FA=100%, FA=90% (FA and VO), and FA=50% (FA and VO) across table counts and inter-arrival times.
Exp 4c: S3 FA=90% FastAppend mean latency. 1 table at 5000ms is ~243ms, 50 tables at 5000ms is ~159ms. At 20ms inter-arrival, latencies range 469-577ms for 20-50 tables. Hatched cells cover the left portion. Exp 4c: S3 FA=90% ValidatedOverwrite mean latency. VO latency higher than FA but benefits from table partitioning: at 5000ms, ranges from ~4930ms (1 table) to ~380ms (50 tables). Hatched cells in the left region indicate low success rates.
Exp 4c: S3 Standard commit latency. FA/VO mean latency heatmaps for FA=90% and FA=50% mixes. Hatched cells indicate low success rates.
S3 Express One Zone
Exp 4c: S3 Express FA=100% success rate. Dramatically better than all other providers. 1 table at 20ms is 60.6%, 50 tables is 89.0%. Only degradation is at 20ms; 50ms+ is 98%+ everywhere. Exp 4c: S3 Express FA=90% FastAppend success rate. Nearly perfect: 1 table at 20ms is 63.2%, 50 tables at 20ms is 89.0%. At 50ms+, all configurations reach 98.5%+. Only the 20ms column shows any degradation.
Exp 4c: S3 Express success rates. Heatmaps for FA=100%, FA=90% (FA and VO), and FA=50% (FA and VO) across table counts and inter-arrival times.
Exp 4c: S3 Express FA=90% FastAppend mean latency. Very low: 1 table at 20ms is 232ms, 50 tables is 167ms. At 5000ms, baseline is 73-83ms. Only 1-2 tables at 20ms show hatching. Exp 4c: S3 Express FA=90% ValidatedOverwrite mean latency. VO latency benefits from table partitioning: 50 tables at 20ms is 747ms, 1 table at 20ms is 18247ms. At 5000ms, ranges 73-214ms. Low CAS latency helps VO when combined with multiple tables.
Exp 4c: S3 Express commit latency. FA/VO mean latency heatmaps for FA=90% and FA=50% mixes. Hatched cells indicate low success rates.
Azure Standard
Exp 4c: Azure FA=100% success rate by table count and inter-arrival. 1 table at 20ms is 16.1%, 50 tables at 20ms is 52.2%. Reaches 100% by 200ms for 10+ tables. Similar profile to S3 Standard. Exp 4c: Azure FA=90% FastAppend success rate. 1 table at 20ms is 17.6%, 50 tables is 52.4%. Very similar to FA=100%; FA success is insensitive to 10% VO in the mix.
Exp 4c: Azure Standard success rates. Heatmaps for FA=100%, FA=90% (FA and VO), and FA=50% (FA and VO) across table counts and inter-arrival times.
Exp 4c: Azure FA=90% FastAppend mean latency. 1 table at 5000ms is ~248ms, 50 tables at 5000ms is ~169ms. At 20ms, latencies range 504-970ms. Hatched cells cover the left half. Exp 4c: Azure FA=90% ValidatedOverwrite mean latency. VO latency higher than FA: at 5000ms, ranges from ~248ms (1 table) to ~169ms (50 tables). At high load, 1 table latency reaches tens of seconds. Hatched cells cover the left region.
Exp 4c: Azure Standard commit latency. FA/VO mean latency heatmaps for FA=90% and FA=50% mixes. Hatched cells indicate low success rates.
Azure Premium
Exp 4c: Azure Premium FA=100% success rate. Better than standard Azure: 1 table at 20ms is 16.7%, 50 tables is 60.6%. Reaches 100% by 200ms for 10+ tables. 1 table at 100ms is 64.4% vs Azure's 58.2%. Exp 4c: Azure Premium FA=90% FastAppend success rate. 1 table at 20ms is 18.4%, 50 tables is 60.9%. At 50ms, 50 tables is 90.9%. Noticeably better than standard Azure FA=90% at high table counts.
Exp 4c: Azure Premium success rates. Heatmaps for FA=100%, FA=90% (FA and VO), and FA=50% (FA and VO) across table counts and inter-arrival times.
Exp 4c: Azure Premium FA=90% FastAppend mean latency. 1 table at 5000ms is ~147ms, 50 tables at 5000ms is ~74ms. At 20ms, 50 tables is ~452ms. Hatched cells cover the left portion. Exp 4c: Azure Premium FA=90% ValidatedOverwrite mean latency. VO latency higher than FA but benefits from table partitioning: at 5000ms, ranges from ~3155ms (1 table) to ~1715ms (50 tables). Hatched cells cover the left region.
Exp 4c: Azure Premium commit latency. FA/VO mean latency heatmaps for FA=90% and FA=50% mixes. Hatched cells indicate low success rates.
Google Cloud Storage (GCS)
Exp 4c: GCS FA=100% success rate. Worst-performing provider. 1 table at 20ms is 4.4%, 50 tables is 27.0%. Does not reach 100% until inter-arrival 2000 for 1-2 tables. Degradation extends much further right than other providers. Exp 4c: GCS FA=90% FastAppend success rate. Much worse than all other providers. 1 table at 20ms is 4.8%, 200ms is 38.2%, 500ms is 74.9%. 50 tables at 20ms is 27.2%.
Exp 4c: GCS success rates. Heatmaps for FA=100%, FA=90% (FA and VO), and FA=50% (FA and VO) across table counts and inter-arrival times.
Exp 4c: GCS FA=90% FastAppend mean latency. Very high: 1 table at 5000ms is ~1795ms, 50 tables at 5000ms is ~932ms. GCS's high base CAS latency inflates all commit latencies. Hatched cells cover most of the left region. Exp 4c: GCS FA=90% ValidatedOverwrite mean latency. Very high due to GCS's high base CAS latency. At 5000ms, ranges ~430ms (uniform across table counts). At high load, 50 tables at 20ms reaches ~27853ms. Most cells are hatched.
Exp 4c: GCS commit latency. FA/VO mean latency heatmaps for FA=90% and FA=50% mixes. Hatched cells indicate low success rates.
  1. The full set of plots for these simulations are here