Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Real-World Config Benchmarks And Large Inputs

The benchmark examples parse checked-in or generated YAML without timing file I/O. They report aggregate cost so small files do not dominate the signal. These benchmark and conformance commands are source-checkout-only: the published crate package ships this document, but it intentionally excludes the dev-dependency examples and fixture corpora used to regenerate the tables.

cargo run --locked --release --example real_world_benchmark
YAML_BENCH_ITERS=1000 cargo run --locked --release --example real_world_benchmark
cargo run --locked --release --example large_input_benchmark
YAML_LARGE_BENCH_ITERS=20 cargo run --locked --release --example large_input_benchmark

Environment for the latest captured run:

  • Reference crates: serde-saphyr 0.0.27 with deserialize only, yaml-rust2 0.11.0, saphyr 0.0.6
  • Small fixture set: 33 files / 39 YAML documents / 25,362 bytes
  • Large fixture set: pinned downstream fixtures plus generated 1 MiB inputs
  • Captured: 2026-06-06 with Cargo’s release profile and --locked

The linked serde-saphyr repository was ahead of crates.io at the time of this capture (0.0.28 in Git, latest published 0.0.27). The benchmark pins the published crate so the checked-in Cargo.lock and package checks remain registry-reproducible.

The serde-saphyr rows use benchmark options rather than the crate defaults: strict_booleans: true plus relaxed event, alias, document, node, scalar, and merge budgets so the generated corpora are comparable throughput inputs. Because serde-saphyr does not expose a native YAML value tree, the matched generic Serde lane deserializes both libraries into serde_yaml::Value. The preflight normalizes two public-contract differences before asserting equality: serde-saphyr::from_multiple_with_options skips empty/null-like documents, and serde-saphyr treats YAML tags as transparent for this target while saneyaml preserves them.

The README overview graphic is a static summary of selected benchmark and feature rows. Its source notes and update checklist live at docs/assets/saneyaml-overview.md; update that note with this file whenever the graphic changes.

The large benchmark’s peak retained bytes and peak retained heap objects columns are safe retained-output estimates from parsed tree container and string capacities after a single parse. They are not allocator instrumentation and do not include transient parser scratch. For multi-fixture corpora, they report the peak retained output for one fixture because each fixture is parsed and dropped independently.

The 2026-06-01 zero-copy line slice removed transient per-line raw/content text allocations from this crate’s parser by storing one resident source buffer and per-line byte ranges. That allocation drop is visible in the parser code path rather than in the retained-output columns, because preprocessed lines are dropped before the parsed tree is returned.

The 2026-06-01 no-merge fast path records whether each parsed document contains a semantic merge key and skips the post-parse merge traversal when none was seen. In the same-session target capture, generated_multi_doc_stream_1mib saneyaml::parse_documents moved from 25.87 to 23.98 ns/byte while saphyr moved from 24.86 to 24.89 ns/byte. Retained output estimates are unchanged because the returned tree shape is unchanged; the removed work is transient per-document merge scanning and its scratch stack.

The 2026-06-01 plain-scalar continuation slice delays String allocation until a plain scalar is proven to span multiple lines. In the next same-session target capture, generated_multi_doc_stream_1mib saneyaml::parse_documents moved from 23.98 to 21.87 ns/byte while saphyr measured 24.42 ns/byte. Retained output estimates are unchanged because single-line scalar output is identical; the removed work is transient short String allocation before the scalar falls back to the inline parse path.

The 2026-06-01 retained-capacity slice trims completed document, sequence, and mapping vectors before returning parsed trees. In the large-input capture below, saneyaml::parse_documents retained bytes moved from 703,340 to 486,188 on the Stackable peak, from 23,031,972 to 13,006,500 on the generated multi-document stream, and from 13,040,211 to 9,893,619 on the generated 1 MiB wide mapping. The same-run speed lead over saphyr remains intact for the default spanful parser on every large-input row.

The same milestone adds saneyaml::parse_borrowed_documents, an explicit spanless retained tree that can borrow scalar strings from the caller’s input buffer. This is an additive load path, not a silent change to saneyaml::parse_documents; the retained-output estimate counts the borrowed tree heap and, like the saphyr row, does not count the caller-owned source buffer. That row closes the retained-memory axis against saphyr 0.0.6 across the large-input corpus while preserving the owning parser’s spans and scalar-source behavior.

Real-World Config Corpus

Latest same-run capture after adding the matched serde-saphyr lane, using YAML_BENCH_ITERS=1000:

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/byte
saneyaml::parse_documents1,00025,36239431.31417.01
saneyaml::from_documents_str::<Value>1,00025,36239572.47722.57
saneyaml::from_documents_str::<serde_yaml::Value>1,00025,36239578.39922.81
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value>1,00025,362391,165.02745.94
serde_saphyr::from_multiple_with_options::<serde_yaml::Value>1,00025,362391,055.33241.61
serde_yaml::Value stream1,00025,36239644.63825.42
yaml_rust2::YamlLoader1,00025,36239539.89421.29
saphyr::Yaml::load_from_str1,00025,36239502.28819.80

On this corpus, the matched generic Serde value lane measured saneyaml at 22.81 ns/byte versus serde-saphyr at 41.61 ns/byte. The private event-backed prototype measured 45.94 ns/byte on the same target, so it is not a replacement for the tree-backed Serde path yet. The raw tree-load rows are shown for context but are a different contract from serde-saphyr’s Serde-only API.

Same-turn pre-optimization baseline, captured before this milestone with the default 200 iterations:

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/byte
saneyaml::parse_documents20019,72733126.37732.03
saneyaml::from_documents_str::<Value>20019,72733141.21135.79
serde_yaml::Value stream20019,72733153.46538.90
yaml_rust2::YamlLoader20019,72733100.95925.59
saphyr::Yaml::load_from_str20019,7273392.39323.42

Post zero-copy line-slice re-capture with 1,000 iterations (independent run, 2026-06-01):

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/byte
saneyaml::parse_documents1,00019,72733355.27118.01
saneyaml::from_documents_str::<Value>1,00019,72733422.61821.42
serde_yaml::Value stream1,00019,72733523.39026.53
yaml_rust2::YamlLoader1,00019,72733434.22222.01
saphyr::Yaml::load_from_str1,00019,72733402.90920.42

Result: after the zero-copy line slice, saneyaml::parse_documents was faster than the pinned reference loaders on this small corpus in that 2026-06-01 1,000-iteration same-run capture (18.01 ns/byte vs saphyr at 20.42 and yaml_rust2 at 22.01). The owning Value path also remains ahead of the serde_yaml Value stream and roughly ties yaml_rust2 on this corpus.

Post no-merge and plain-scalar fast-path re-capture with the default 200 iterations (independent run, 2026-06-01):

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/byte
saneyaml::parse_documents20019,7273372.59818.40
saneyaml::from_documents_str::<Value>20019,7273384.58521.44
serde_yaml::Value stream20019,72733116.03129.41
yaml_rust2::YamlLoader20019,7273382.79220.98
saphyr::Yaml::load_from_str20019,7273378.60619.92

Methodology caveat: the pre-optimization table above was captured at 200 iterations and the post-optimization table at 1,000, so part of the across-table ns/byte drop reflects warm-up rather than optimization. The trustworthy signal is the same-run cross-loader comparison within each table, plus the larger, lower-noise inputs below — not the across-table delta.

Large Inputs

Command:

cargo run --locked --release --example large_input_benchmark

Default iterations: 20, controlled by YAML_LARGE_BENCH_ITERS.

external_downstream_all

20 pinned downstream files / 245,062 bytes / 20 YAML documents.

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/bytepeak retained bytespeak retained heap objects
saneyaml::parse_documents20245,0622031.8876.51486,1883,983
saneyaml::parse_borrowed_documents20245,0622030.2566.17173,556904
saneyaml::from_documents_str::<Value>20245,0622040.0358.17217,4833,780
saneyaml::from_documents_str::<serde_yaml::Value>20245,0622041.3338.43378,8433,780
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value>20245,0622077.03215.72396,9873,780
serde_saphyr::from_multiple_with_options::<serde_yaml::Value>20245,0622067.76313.83396,9873,780
serde_yaml::Value stream20245,0622051.01910.41396,9873,780
yaml_rust2::YamlLoader20245,0622038.4707.85382,4973,796
saphyr::Yaml::load_from_str20245,0622036.4197.43534,7863,780

stackable_dummy_cluster

One pinned Stackable CRD / 177,556 bytes / 1 YAML document.

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/bytepeak retained bytespeak retained heap objects
saneyaml::parse_documents20177,556120.5705.79486,1883,983
saneyaml::parse_borrowed_documents20177,556119.1595.40173,556904
saneyaml::from_documents_str::<Value>20177,556124.5246.91217,4833,780
saneyaml::from_documents_str::<serde_yaml::Value>20177,556125.2037.10378,8433,780
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value>20177,556147.65713.42396,9873,780
serde_saphyr::from_multiple_with_options::<serde_yaml::Value>20177,556141.44511.67396,9873,780
serde_yaml::Value stream20177,556132.3439.11396,9873,780
yaml_rust2::YamlLoader20177,556124.2456.83382,4973,796
saphyr::Yaml::load_from_str20177,556123.0956.50534,7863,780

generated_multi_doc_stream_1mib

Generated multi-document service stream / 1,048,680 bytes / 8,020 YAML documents.

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/bytepeak retained bytespeak retained heap objects
saneyaml::parse_documents201,048,6808,020367.32117.5113,006,500128,321
saneyaml::parse_borrowed_documents201,048,6808,020361.22917.224,106,24032,081
saneyaml::from_documents_str::<Value>201,048,6808,020495.59223.634,729,860112,281
saneyaml::from_documents_str::<serde_yaml::Value>201,048,6808,020538.15525.669,862,660112,281
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value>201,048,6808,0201,116.47153.2311,607,364112,281
serde_saphyr::from_multiple_with_options::<serde_yaml::Value>201,048,6808,0201,223.59758.3411,607,364112,281
serde_yaml::Value stream201,048,6808,020661.46431.5411,607,364112,281
yaml_rust2::YamlLoader201,048,6808,020545.25326.0010,386,948112,281
saphyr::Yaml::load_from_str201,048,6808,020534.70025.4914,770,560112,281

generated_wide_mapping_256kib

Generated one-document wide service mapping / 262,176 bytes.

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/bytepeak retained bytespeak retained heap objects
saneyaml::parse_documents20262,176178.22614.922,484,77523,932
saneyaml::parse_borrowed_documents20262,176173.89814.09765,7922,994
saneyaml::from_documents_str::<Value>20262,1761107.35920.47938,23617,950
saneyaml::from_documents_str::<serde_yaml::Value>20262,1761106.08520.231,895,47617,950
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value>20262,1761223.81542.681,895,69217,950
serde_saphyr::from_multiple_with_options::<serde_yaml::Value>20262,1761212.83140.591,895,69217,950
serde_yaml::Value stream20262,1761114.75821.891,895,69217,950
yaml_rust2::YamlLoader20262,1761102.76219.601,704,22017,950
saphyr::Yaml::load_from_str20262,176197.24518.552,393,31217,950

generated_wide_mapping_1mib

Generated one-document wide service mapping / 1,048,661 bytes.

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/bytepeak retained bytespeak retained heap objects
saneyaml::parse_documents201,048,6611309.17214.749,893,61995,236
saneyaml::parse_borrowed_documents201,048,6611282.01313.453,047,52011,907
saneyaml::from_documents_str::<Value>201,048,6611419.43520.003,739,05971,428
saneyaml::from_documents_str::<serde_yaml::Value>201,048,6611417.00319.887,548,45971,428
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value>201,048,6611878.51241.897,548,67571,428
serde_saphyr::from_multiple_with_options::<serde_yaml::Value>201,048,6611843.63840.227,548,67571,428
serde_yaml::Value stream201,048,6611477.81722.787,548,67571,428
yaml_rust2::YamlLoader201,048,6611420.14220.036,786,77171,428
saphyr::Yaml::load_from_str201,048,6611401.29919.139,523,71271,428

Large-input story: after zero-copy line storage, the no-merge fast path, delayed plain-scalar continuation allocation, and retained vector right-sizing, saneyaml::parse_documents beats yaml_rust2 and saphyr on every large parser path in the latest capture on an unloaded machine. In the matched Serde value lane, saneyaml::from_documents_str::<serde_yaml::Value> is faster than serde_saphyr::from_multiple_with_options::<serde_yaml::Value> on every large-input row. The hidden event-backed Serde prototype only wins against serde-saphyr on the generated multi-document stream in this capture and remains slower on the other large rows; it also retains the same serde_yaml::Value output shape, so it does not improve retained output memory yet. The smallest corpus (external_downstream_all) is the most contention-sensitive, so its ordering is the first to wobble under load; the larger corpora hold a clearer margin. The retained-memory story is now split by output contract: the default spanful tree keeps spans and scalar-source spellings and is faster than saphyr, while the additive saneyaml::parse_borrowed_documents tree drops spans/source spellings and borrows sliceable scalars, retaining less heap than saphyr on every large-input row (for example, 3,047,520 vs 9,523,712 bytes on the 1 MiB wide mapping).

Streaming And Compact Line-Table Milestone

This milestone adds a compact per-line table, a fused line-preprocessing scan, source-backed borrowed scalars, and a lazy streaming line buffer that reclaims consumed lines as DocumentStream/EventStream advance. Batch loaders keep an eager line table for speed; only the streaming entrypoints reclaim. The input string itself stays fully resident, so these are bounded-retention streaming paths, not constant-memory readers.

Captured in a single release session against the in-repo harnesses:

YAML_BENCH_ITERS=1000 cargo run --locked --release --example real_world_benchmark
cargo run --locked --release --example dhat_memory -- --all
cargo run --locked --release --example conformance_compare

Real-world config corpus (1,000 iterations)

33 files / 39 YAML documents / 25,362 bytes. This is a distinct, later capture from a separate same-session run of this milestone, not the same measurement as the “Real-World Config Corpus” table above; the corpus is identical but the per-loader ns/byte figures differ run to run (for example saphyr reads 21.42 here versus 19.80 there), which is the run-to-run noise the methodology caveat describes.

parser/load pathns/byte
saneyaml::parse_documents15.03
saneyaml::from_documents_str::<Value>21.19
saphyr::Yaml::load_from_str21.42
yaml_rust2::YamlLoader23.11
serde_yaml::Value stream24.98

On this corpus saneyaml::parse_documents is the fastest load path; the owning Value path ties saphyr and stays ahead of yaml_rust2 and serde_yaml.

Allocator-backed memory (dhat), 1 MiB multi-document stream

8,020 documents. retained blocks is the count of live allocations still held while the parsed output is retained.

pathallocsbytes allocatedpeakretained blocks
saneyaml stream docs184,46613.64 MB2.10 MB4
saneyaml stream events232,59449.28 MB2.11 MB6
saneyaml borrowed80,21917.29 MB6.21 MB32,081
saneyaml owned200,51916.05 MB15.12 MB128,321
saneyaml Value449,14025.07 MB15.12 MB112,281
yaml-rust2585,47829.29 MB17.15 MB192,481
saneyaml as serde_yaml::Value465,18039.73 MB20.79 MB136,341
saneyaml event-backed as serde_yaml::Value1,114,806175.38 MB22.83 MB136,341
serde-saphyr as serde_yaml::Value577,46559.71 MB21.79 MB136,344
serde_yaml721,82184.73 MB21.84 MB136,341
saphyr216,55922.77 MB22.30 MB192,481

On a multi-document stream the streaming loaders hold a bounded working set (retained blocks stay at 4–6 regardless of stream length) and post the lowest peak; the borrowed batch tree has the lowest peak among the non-streaming loaders. The event-backed Serde prototype is allocation-heavy here because it still consumes parser-recorded event frames rather than a direct parser-to-Serde stream.

Allocator-backed memory (dhat), 1 MiB wide single document

pathpeakretained blocks
serde-saphyr as serde_yaml::Value10.73 MB83,337
yaml-rust210.98 MB130,951
saphyr14.10 MB130,951
saneyaml borrowed15.32 MB11,907
saneyaml owned16.16 MB95,236
saneyaml stream docs16.16 MB4
saneyaml Value16.39 MB71,428
saneyaml as serde_yaml::Value19.91 MB83,334
serde_yaml23.42 MB83,334
saneyaml stream events62.22 MB6
saneyaml event-backed as serde_yaml::Value78.54 MB83,334

Streaming only helps when there are document boundaries to reclaim at. On a single wide document there is nothing to reclaim mid-parse, so yaml-rust2 and saphyr post lower peaks than saneyaml on this shape, and the event-streaming path is expensive here because it buffers per-event output for one large document. The matched serde-saphyr value row posts a low wide-document peak, while the event-backed Serde prototype is the highest peak in this capture. Streaming is a multi-document memory win, not a universal one.

Conformance (402 curated cases)

libraryspec accept/reject (400)tree policy (2)
saneyaml400/4002/2
yaml-rust2400/4002/2
saphyr400/4000/2
serde_yaml333/4002/2

saneyaml ties yaml-rust2 and saphyr at 400/400 on the neutral spec set; it is not a sole leader there. Its differentiation is the combination of full spec conformance with tree-policy rejection of the duplicate-key/tree-error cases that saphyr accepts, while serde_yaml trails the spec set at 333/400.

Reproduction & Tooling

Every number in this document comes from an in-repo example, run under Cargo’s release profile. The commands below regenerate each captured table from a source checkout of this repository; absolute values vary by machine, but the same-run cross-loader ordering is the trustworthy signal on an otherwise-idle machine. The harness is a hand-rolled Instant::now() loop with no warm-up or statistics, so under heavy machine load even that ordering can invert; treat any single capture as indicative rather than authoritative.

captured sectioncheckout-only command
Real-World Config Corpuscargo run --locked --release --example real_world_benchmark
Real-world corpus (1,000 iterations)YAML_BENCH_ITERS=1000 cargo run --locked --release --example real_world_benchmark
Large Inputs (all corpora)cargo run --locked --release --example large_input_benchmark
Large Inputs (custom iteration count)YAML_LARGE_BENCH_ITERS=20 cargo run --locked --release --example large_input_benchmark
Allocator-backed memory (dhat)cargo run --locked --release --example dhat_memory -- --all
dhat single (library, corpus) paircargo run --locked --release --example dhat_memory -- saneyaml-borrowed multidoc
Conformance (402 curated cases)cargo run --locked --release --example conformance_compare

Iteration counts default to 200 for real_world_benchmark (YAML_BENCH_ITERS) and 20 for large_input_benchmark (YAML_LARGE_BENCH_ITERS). The dhat_memory example installs a global allocator and must measure one library per process; -- --all sweeps every (library, corpus) pair for you, and -- <library> <corpus> profiles a single pair.

Reference-crate versions

The captured comparison numbers were produced against these pinned dev-dependency versions (see Cargo.toml):

crateversion
serde-saphyr0.0.27 (default-features = false, deserialize)
serde_yaml0.9.34
saphyr0.0.6
saphyr-parser0.0.6
yaml-rust20.11.0
dhat0.3.3

To reproduce against the exact pinned set, build with the checked-in Cargo.lock (the default for cargo run). Bumping any reference crate can shift its numbers, so re-capture the whole comparison table when upgrading.