Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

saneyaml documentation

Serde-first YAML for Rust, with real YAML 1.2 semantics. Read this as a hosted guide at https://jskoiz.github.io/saneyaml/, or start with Getting started in the repo and open a topic page when you hit it.

Learn the basics

  • Getting started — install, parse into a struct, emit. ~5 minutes.
  • Cookbook — copy-paste recipes for the common tasks.

Topic guides

  • Schema modes — YAML 1.2 vs 1.1, and why NO stays the string "NO".
  • Diagnostics — line/column, key paths, and source carets in errors.
  • Untrusted input — resource limits for hostile YAML.
  • Editing files — change values in place without losing comments, anchors, or ordering.
  • Streaming — pull events/documents with bounded memory.

Migrating

Reference

Project

Changelog · Security policy · Contributing · License

Getting started

Install, parse YAML into your own types, and emit it back. About five minutes.

Snippets below elide the enclosing function and use ?; assume each runs in a function returning saneyaml::Result<()>.

Install

[dependencies]
saneyaml = "0.3.0"
serde = { version = "1", features = ["derive"] }

Pure Rust, no C bindings, #![forbid(unsafe_code)]. MSRV 1.88.

Parse into a struct

The common case — load a config straight into your own types with #[derive(Deserialize)]:

#![allow(unused)]
fn main() {
use serde::Deserialize;

#[derive(Deserialize)]
struct Config {
    name: String,
    port: u16,
    tags: Vec<String>,
}

let cfg: Config = saneyaml::from_str("\
name: web
port: 8080
tags: [http, public]
")?;

assert_eq!(cfg.port, 8080);
assert_eq!(cfg.tags, ["http", "public"]);
}

Three entry points, same behavior — pick by what you hold:

#![allow(unused)]
fn main() {
let a: Config = saneyaml::from_str(text)?;     // &str
let b: Config = saneyaml::from_slice(bytes)?;  // &[u8]
let c: Config = saneyaml::from_reader(file)?;  // impl std::io::Read
}

Parse into a dynamic value

When you don’t know the shape ahead of time, deserialize into Value and walk it:

#![allow(unused)]
fn main() {
let v: saneyaml::Value = saneyaml::from_str("name: web\nport: 8080\n")?;

assert_eq!(v["name"].as_str(), Some("web"));
assert_eq!(v["port"].as_u64(), Some(8080));
}

Value mirrors serde_yaml::Value: as_str, as_i64, as_bool, as_sequence, as_mapping, indexing by key or position, and get for a non-panicking lookup. See the Cookbook for mutation and patching.

Emit

Serialize any Serialize value to YAML:

#![allow(unused)]
fn main() {
use serde::Serialize;

#[derive(Serialize)]
struct Config { name: String, port: u16 }

let cfg = Config { name: "web".into(), port: 8080 };

let text = saneyaml::to_string(&cfg)?;         // -> String
// name: web
// port: 8080

saneyaml::to_writer(std::io::stdout(), &cfg)?; // writes to any impl std::io::Write
}

The default writer produces clean, deterministic YAML. To tune layout — sort keys, force quoting, flow vs block — use EmitOptions and to_string_with_options; see the Cookbook.

Entry-point cheat sheet

You have…You want…Call
&str / &[u8] / readerone typed valuefrom_str / from_slice / from_reader
a multi-document streama Vec<T>from_documents_str / _slice / _reader
a multi-document streamone document at a timeDeserializer::from_str(...), then iterate
a Serialize valuea String or writerto_string / to_writer
YAML textthe raw structureparse_strNode, or read into Value
non-default schema or limitsany of the aboveLoadOptions::…().from_str(...)

Where to next

Cookbook

Short, copy-paste recipes for the tasks that come up most. For the basics, read Getting started first.

Snippets elide the enclosing function and use ?; assume each runs in a function returning saneyaml::Result<()>.

Multi-document streams

Kubernetes-style ----separated streams. Get everything at once:

#![allow(unused)]
fn main() {
let stream = "\
kind: Service
---
kind: Deployment
";

let docs: Vec<Manifest> = saneyaml::from_documents_str(stream)?;
assert_eq!(docs.len(), 2);
}

Or process one document at a time — useful when an early document is valid but a later one fails, and you want the good ones first:

#![allow(unused)]
fn main() {
use serde::Deserialize;

for doc in saneyaml::Deserializer::from_str(stream) {
    let manifest = Manifest::deserialize(doc)?;
    // handle manifest…
}
}

from_documents_str is all-or-error; the iterator yields parsed documents up to the first error. For bounded memory on large streams, see Streaming.

Work with Value

Read, mutate, and patch a dynamic document. Indexing returns a null sentinel for missing paths, so chains don’t panic:

#![allow(unused)]
fn main() {
let mut v: saneyaml::Value = saneyaml::from_str("\
services:
  api:
    image: nginx:1.25
    ports: [80]
")?;

// read
assert_eq!(v["services"]["api"]["image"].as_str(), Some("nginx:1.25"));

// patch in place
v["services"]["api"]["image"] = saneyaml::Value::from("nginx:1.27");

// mutate a sequence
if let Some(ports) = v["services"]["api"]["ports"].as_sequence_mut() {
    ports.push(saneyaml::Value::from(443));
}
}

Value::from accepts strings, bools, every integer/float width, Mapping, and Vec<Value>. Build maps with Mapping, which keeps insertion order and offers the full entry / get / insert / remove API.

To convert between typed values and Value without going through text, use to_value and from_value:

#![allow(unused)]
fn main() {
let value = saneyaml::to_value(&cfg)?;        // T -> Value
let cfg: Config = saneyaml::from_value(value)?; // Value -> T
}

from_value is spanless: it won’t coerce a number or bool into a String target, because the original spelling is gone after the value is built. Read with from_str / from_slice when a string field must preserve source text like 1_000 or FALSE.

Enums and singleton maps

Data-carrying enums round-trip as YAML tags by default (!Variant value). For the serde_yaml single-key-map shape (variant: value), annotate the field:

#![allow(unused)]
fn main() {
use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize)]
enum Action { Run(String), Skip }

#[derive(Serialize, Deserialize)]
struct Job {
    #[serde(with = "saneyaml::with::singleton_map")]
    action: Action,
}
}

Use saneyaml::with::singleton_map_recursive when nested enum payloads also need the one-entry-map shape. To emit every enum as a singleton map without annotating each field, set the emitter option globally:

#![allow(unused)]
fn main() {
use saneyaml::{EmitOptions, EnumRepresentation};

let opts = EmitOptions::structural()
    .with_enum_representation(EnumRepresentation::SingletonMap);
let text = saneyaml::to_string_with_options(&job, opts)?;
}

Anchors and merge keys

&anchor / *alias and the << merge key are expanded for you when loading into Node or Value — you read the effective, merged result:

#![allow(unused)]
fn main() {
let v: saneyaml::Value = saneyaml::from_str("\
defaults: &defaults
  retries: 3
service:
  <<: *defaults
  name: api
")?;

assert_eq!(v["service"]["retries"].as_u64(), Some(3));
assert_eq!(v["service"]["name"].as_str(), Some("api"));
}

Explicit keys override merged ones, and earlier entries in a merge list win. Value::apply_merge() is also available as an explicit in-place helper.

If you need the raw << syntax and anchor/alias graph identity (not the expanded result), parse with parse_events / EventStream or the lossless graph.

Numbers, timestamps, and binary

Number widens to i128 / u128; the usual helpers are range-checked:

#![allow(unused)]
fn main() {
let v: saneyaml::Value = saneyaml::from_str("count: 9000000000000000000\n")?;
assert_eq!(v["count"].as_u64(), Some(9_000_000_000_000_000_000));
}

Timestamps and !!binary are YAML 1.1 features and need an explicit schema or tag. !!timestamp scalars are read via as_timestamp() / typed saneyaml::Timestamp fields; !!binary decodes into byte targets like Vec<u8>. See Schema modes for enabling YAML 1.1 typing.

Control emitted YAML

The default (EmitOptions::structural()) is insertion-order, plain-where-safe, block layout. Tune it with builder methods:

#![allow(unused)]
fn main() {
use saneyaml::{EmitOptions, KeyOrder, ScalarQuoteStyle};

let opts = EmitOptions::structural()
    .with_key_order(KeyOrder::Sort)                    // Preserve | Sort
    .with_scalar_quote_style(ScalarQuoteStyle::DoubleQuoted); // PlainWhereSafe | SingleQuoted | DoubleQuoted

let text = saneyaml::to_string_with_options(&cfg, opts)?;
}

Other knobs: with_collection_style (Block | Flow), with_block_scalar_style (Literal | Folded), with_enum_representation (Tag | SingletonMap), and with_yaml_1_1_safe_strings(true) to quote strings like no / 12:34:56 so older YAML 1.1 readers don’t reinterpret them.

For byte-for-byte serde_yaml writer output on the supported structural corpus, use EmitOptions::byte_compatible(). Comments and source formatting are not a writer concern — use in-place editing to preserve them.

Custom tags

Application tags (!Ref, !Env, CloudFormation intrinsics, …) are preserved in Value and visible to enum dispatch. For ordinary typed reads they’re transparent metadata:

#![allow(unused)]
fn main() {
// !Env prod          -> String "prod"
// !Ports [80, 443]   -> Vec<u16>
// !Maybe null        -> Option<T> (None)
}

Inspect a tag explicitly with value.as_tagged(), which exposes the tag and the inner Value.

Schema modes

A schema decides how a plain scalar like NO, on, or 0123 becomes a typed value. saneyaml defaults to YAML 1.2, where those stay strings — so you don’t get the “Norway problem.”

The Norway problem

In YAML 1.1 (and the archived serde_yaml), NO resolves to the boolean false. A country-code list [NO, SE, FI] silently becomes [false, SE, FI].

YAML 1.2 fixed this: only true / false are booleans. saneyaml follows 1.2 by default.

#![allow(unused)]
fn main() {
let codes: Vec<String> = saneyaml::from_str("[NO, SE, FI]")?;
assert_eq!(codes, ["NO", "SE", "FI"]); // strings, as written
}

Default resolution (YAML 1.2)

Plain scalarResolves to
null, Null, NULL, ~, missing valuenull
true, false, True, FALSEbool
yes, no, on, off, y, n, NOstring
123, +12, 0123, 1_000integer (decimal — 0123 is 123)
0x7B, 0b1010, 0o77string
1.5, .inf, -.Inf, .NANfloat
1:20, 1:20:30string
2026-05-24, datetimesstring

The full table for every mode is in COMPATIBILITY.md → Scalar resolution.

Choosing a mode

Pass a configured LoadOptions instead of calling from_str directly:

#![allow(unused)]
fn main() {
use saneyaml::LoadOptions;

let cfg: Config = LoadOptions::core().from_str(text)?;          // YAML 1.2 (the default)
let cfg: Config = LoadOptions::json().from_str(text)?;          // JSON booleans/null/numbers only
let cfg: Config = LoadOptions::failsafe().from_str(text)?;      // every scalar stays a string
let cfg: Config = LoadOptions::legacy_serde_yaml().from_str(text)?; // YAML 1.1 / serde_yaml-style
}
ModeUse it when
core() (default)You want correct YAML 1.2 behavior.
json()Input is JSON-ish and you want strict JSON scalar typing.
failsafe()You want raw strings and will type things yourself.
legacy_serde_yaml() / yaml_1_1()You have an existing corpus that depends on nofalse, octal 0123, sexagesimals, !!timestamp typing, etc.

Schema::{Core, Json, Failsafe, LegacySerdeYaml} are the enum equivalents; Schema::Yaml12 and Schema::Yaml11 are retained aliases for Core and LegacySerdeYaml.

Per-document %YAML directives

To let each document’s %YAML header pick the mode — %YAML 1.1 gets legacy construction, everything else stays 1.2 — use directive mode:

#![allow(unused)]
fn main() {
let docs = saneyaml::LoadOptions::yaml_version_directive().from_documents_str(stream)?;
}

What YAML 1.1 mode turns on

legacy_serde_yaml() / yaml_1_1() resolves the legacy forms: boolean words (yes/no/on/off), octal 0123, 0x/0b radix integers, base-60 sexagesimals, and !!timestamp-shaped scalars (read via saneyaml::Timestamp). Numeric spellings that overflow Number stay strings.

Even in YAML 1.2 mode you can still opt into individual YAML 1.1 types with explicit tags (!!int 0o77, !!binary …, !!timestamp …) without switching the whole schema. See COMPATIBILITY.md for the exact tag rules.

Diagnostics

Errors carry where and what — line/column, an in-document key path, and an opt-in source caret — so a bad config tells the user how to fix it.

Snippets elide the enclosing function; assume a function returning saneyaml::Result<()>.

Location

Error::line() and column() mirror the common serde_yaml convenience path (location() returns the same as a Location):

#![allow(unused)]
fn main() {
let err = saneyaml::from_str::<Config>("name: [\n").unwrap_err();

if let (Some(line), Some(col)) = (err.line(), err.column()) {
    eprintln!("error at {line}:{col}");
}
}

Key path

path() reports where in the document the error is, using familiar traversal syntax — server.port, ports[1], bracket-quoted non-identifier keys:

#![allow(unused)]
fn main() {
let err = saneyaml::from_str::<Config>("server:\n  port: not-a-number\n").unwrap_err();

if let Some(path) = err.path() {
    eprintln!("at {path}"); // at server.port
}
}

Source caret

render_source(input) returns a Display that points at the offending span, rustc-style — great for CLI output:

#![allow(unused)]
fn main() {
let input = "name: web\nport: [\n";
let err = saneyaml::from_str::<Config>(input).unwrap_err();

eprintln!("{}", err.render_source(input));
// 2 | port: [
//   |       ^ …
}

Use render_source_with_options(input, SourceRenderOptions) to control the number of context lines.

Categorize

category() returns an ErrorCategory for branching — e.g. distinguishing a parse/syntax failure from a type mismatch. document_index() reports which document in a stream failed (zero-based).

#![allow(unused)]
fn main() {
use saneyaml::ErrorCategory;

match err.category() {
    ErrorCategory::Syntax => { /* malformed YAML */ }
    ErrorCategory::Data   => { /* type/shape mismatch */ }
    _ => { /* Limit, Reference, DuplicateKey, Io, … */ }
}
}

What carries spans

SourceLine/column?
from_str / from_slice / from_nodeyes
Deserializer::from_str / from_slice (incl. stream items)yes
from_reader (after buffering)yes
from_value and direct Value readsno — Value is spanless

The flat Display string stays compatible with serde_yaml; everything above is additive.

Untrusted input

YAML from the network, user uploads, or a CI job is hostile until proven otherwise. saneyaml applies structural resource limits by default, and lets you tune them per call site.

Snippets elide the enclosing function; assume a function returning saneyaml::Result<()>.

Defaults

Every parser, loader, streaming, lossless, and Serde entry point enforces these out of the box:

LimitDefaultRejects
Input size64 MiBOversized payloads, before parsing
Nesting depth128Deeply nested block/flow bombs
Scalar size1 MiBSingle giant scalars
Collection size16,384 entriesWide sequence/mapping bombs
Alias expansioninput-derived budgetBillion-laughs alias bombs
Recursive aliasesalways rejected

The defaults accept real-world config (Kubernetes CRDs, OpenAPI, Compose) while rejecting compact bombs that sit under the byte ceiling.

Tighten for a specific call

Lower a limit when you know your inputs are small:

#![allow(unused)]
fn main() {
use saneyaml::LoadOptions;

let cfg: Config = LoadOptions::new()
    .max_input_bytes(256 * 1024)        // cap at 256 KiB
    .max_nesting_depth(32)
    .max_collection_items(1_000)
    .from_str(text)?;
}

All knobs: max_input_bytes, max_alias_expansion_nodes, max_nesting_depth, max_scalar_bytes, max_collection_items.

Relax — only when you’ve bounded the source yourself

Each without_* opt-out transfers that part of the bound to you. Use them only when the source is already trusted or size-checked upstream:

#![allow(unused)]
fn main() {
let node = saneyaml::LoadOptions::new()
    .without_input_limit()   // also: without_nesting_depth_limit,
    .parse_str(local_file)?;  //       without_scalar_limit, without_collection_limit
}

What the limits are — and aren’t

These are structural construction limits, not wall-clock or resident-memory guarantees:

  • Reader entry points fully buffer bounded input before parsing.
  • Raw event/lossless streams validate alias references but don’t expand them, so they don’t spend the alias budget.
  • Your own Deserialize impls can still allocate after the YAML layer hands them bounded values.
  • saneyaml validates YAML structure, not application schemas (Kubernetes, OpenAPI, …).

For the full threat model and reporting process, see COMPATIBILITY.md → Threat model and SECURITY.md.

Editing files

Change values in an existing YAML file while keeping every comment, anchor, blank line, and untouched byte exactly where it was. This is the part a load-then-re-emit round trip can’t do.

Snippets elide the enclosing function; assume a function returning saneyaml::Result<()>.

Edit by path

saneyaml::edit opens a ConfigEditor. Address values by path, then finish:

#![allow(unused)]
fn main() {
let source = "\
service stack
services:
  web:
    image: nginx:1.25
    ports:
      - \"80:80\"
";

let mut editor = saneyaml::edit(source)?;
editor
    .set(saneyaml::ConfigPath::keys(["services", "web", "image"]), "nginx:1.27")?
    .push(saneyaml::ConfigPath::keys(["services", "web", "ports"]), "8080:80")?;

let edited = editor.finish()?;
assert!(edited.contains("# service stack"));   // comment preserved
assert!(edited.contains("image: nginx:1.27")); // value updated
assert!(edited.contains("- 8080:80"));         // item appended
}

Operations: set, insert, remove, rename, push (append to a sequence), and insert_item (insert at an index). Each returns &mut Self, so chain them; the editor reparses between operations so later paths see current source.

Addressing paths

#![allow(unused)]
fn main() {
use saneyaml::{ConfigPath, PathSegment};

// string keys (most common)
ConfigPath::keys(["metadata", "labels", "app"]);

// mixed keys and sequence indices
ConfigPath::new([
    PathSegment::from("jobs"),
    PathSegment::from("test"),
    PathSegment::from("steps"),
    PathSegment::from(0usize),
    PathSegment::from("uses"),
]);

// JSON Pointer — handles keys containing "/" or "~"
ConfigPath::json_pointer("/metadata/labels/app.kubernetes.io~1name")?;
}

Read and write files directly

#![allow(unused)]
fn main() {
let mut editor = saneyaml::edit_file("compose.yaml")?;
editor.set(saneyaml::ConfigPath::keys(["version"]), "3.9")?;
editor.finish_to_file()?; // writes back to compose.yaml
}

Inspect without editing

Drop to LosslessStream when you need to read source-level detail — comments, exact scalar spelling, anchor/alias graph identity — that the semantic Value tree discards:

#![allow(unused)]
fn main() {
let stream = saneyaml::parse_lossless(source)?;

for comment in stream.comments() {
    println!("{}", comment.text());
}
}

LosslessStream also exposes effective_mapping_entries(node) — the merged view of a mapping with << provenance kept — and source_fragment(span) to recover the original bytes for any node. It’s the surface for tools that must preserve or analyze source, not just values.

A runnable end-to-end example (Docker Compose, Kubernetes, GitHub Actions) lives in examples/config_refactor.rs.

Streaming

Pull-based iterators that process YAML without holding every parsed document at once. Use them for large multi-document streams; for small configs, from_str is simpler.

Snippets elide the enclosing function; assume a function returning saneyaml::Result<()>.

Two levels

StreamYieldsUse when
DocumentStreamone semantic Node per documentYou want documents one at a time, with merge expansion and schema applied.
EventStreamlow-level parser eventsYou want raw structure — scalar style, flow vs block, anchors, the literal << — without building a tree.

Both construct from &str, &[u8], or a reader, and both are plain Iterators.

Documents one at a time

#![allow(unused)]
fn main() {
for doc in saneyaml::stream::DocumentStream::from_str(stream)? {
    let node = doc?; // saneyaml::Node
    // handle one document, then it can be dropped before the next is parsed
}
}

Raw events

#![allow(unused)]
fn main() {
use saneyaml::Event;

for event in saneyaml::stream::EventStream::from_str(source)? {
    match event? {
        Event::Scalar { .. }       => { /* value, with its style + tag */ }
        Event::MappingStart { .. } => { /* … */ }
        Event::Alias { .. }        => { /* a raw *alias, not expanded */ }
        _ => {}
    }
}
}

Events expose what the semantic tree throws away: scalar quote style, block vs flow collection style, anchors/aliases as distinct events, tags, and document directives. Aliases are not expanded here — that’s what makes events the right tool for preserving or analyzing the original document.

What “bounded memory” means

Streaming bounds the retained parsed representation: DocumentStream keeps one document live at a time instead of a whole Vec. The source bytes are still fully buffered — these are synchronous pull APIs over an in-memory input, not constant-memory async readers.

The memory win is real on multi-document streams (the working set stays flat as the stream grows) and negligible on a single large document, where there’s nothing to reclaim mid-parse. The benchmarks quantify both.

Need it all at once?

parse_documents / parse_events are the all-or-error collectors over the same parser, returning a Vec. Reach for the streams only when you want to act on items as they arrive or cap retained documents.

Migrating from serde_yaml

serde_yaml is archived. For config-shaped Serde code, saneyaml is close to a drop-in: the read API, Value, and the with::singleton_map helpers keep the same spelling — now with YAML 1.2 scalar resolution and richer diagnostics.

This page is the call-site cookbook. The exhaustive support matrix, divergences, and threat model live in COMPATIBILITY.md; the scalar-typing differences are in Schema modes.

Scope. saneyaml is an adoption candidate for config-shaped Serde reads plus structural writes. It is not a blanket drop-in for every YAML document, every emitter byte, or full YAML 1.1 / libyaml behavior.

Two ways to switch

Keep serde_yaml::… spellings — alias the package in Cargo, change nothing in source:

[dependencies]
serde_yaml = { package = "saneyaml", version = "0.3.0" }

serde_yaml::from_str, serde_yaml::Value, serde_yaml::with::singleton_map, and friends keep compiling against this crate.

Or import directly and rewrite the prefix:

[dependencies]
saneyaml = "0.3.0"
#![allow(unused)]
fn main() {
// mechanical rename
let cfg: saneyaml::Value = saneyaml::from_str(input)?;

// …or alias one file at a time
use saneyaml as serde_yaml;
let cfg: serde_yaml::Value = serde_yaml::from_str(input)?;
}

The shipped examples/serde_yaml_migration.rs compiles the full alias surface end to end.

Cookbook

Each recipe shows the call site. Under a Cargo/source alias, keep the serde_yaml:: spelling; with a direct import, swap the prefix to saneyaml::.

Typed reads — unchanged:

#![allow(unused)]
fn main() {
let config: Config = saneyaml::from_str(input)?;
let config: Config = saneyaml::from_slice(bytes)?;
let config: Config = saneyaml::from_reader(reader)?;
}

Value indexing and patching — unchanged:

#![allow(unused)]
fn main() {
let mut value: saneyaml::Value = saneyaml::from_str(input)?;
value["services"]["api"]["image"] = saneyaml::Value::from("nginx:latest");
let ports = value["services"]["api"]["ports"].as_sequence();
}

Tagged enums / singleton maps — same helpers:

#![allow(unused)]
fn main() {
#[derive(serde::Serialize, serde::Deserialize)]
struct Job {
    #[serde(with = "saneyaml::with::singleton_map")]
    action: Action,
}
}

Use singleton_map_recursive for nested enum payloads.

Multi-document streams — iterate, or collect:

#![allow(unused)]
fn main() {
let docs = saneyaml::Deserializer::from_str(stream)
    .map(Config::deserialize)
    .collect::<Result<Vec<_>, _>>()?;

let docs: Vec<Config> = saneyaml::from_documents_str(stream)?; // additive convenience
}

Structural writes — unchanged:

#![allow(unused)]
fn main() {
let text = saneyaml::to_string(&config)?;
saneyaml::to_writer(&mut writer, &config)?;
}

Errorsline() / column() still work; span(), category(), path(), and render_source() are additive (Diagnostics):

#![allow(unused)]
fn main() {
let err = saneyaml::from_str::<Config>("name: [").unwrap_err();
if let Some(loc) = err.location() {
    eprintln!("{}:{}", loc.line(), loc.column());
}
}

What behaves differently

Five things a migrator should know — most code never touches them:

ChangeWhat it means for you
YAML 1.2 by defaultno / on / NO stay strings, not booleans. Opt into the old behavior per call with LoadOptions::legacy_serde_yaml(). See Schema modes.
Merge keys expand by defaultLoaded Node/Value give you the merged result. serde_yaml::Value kept the literal << until apply_merge(). To see raw <<, use events or the lossless graph.
Value is spanlessIt won’t coerce a number/bool into a String target, and it doesn’t carry comments, anchors, or graph identity. Read with from_str/from_node when source text matters; use the lossless graph for formatting.
Structural writerto_string emits clean deterministic YAML, not byte-identical serde_yaml. Pass EmitOptions::byte_compatible() for the supported byte corpus.
Resource limits on by defaultUntrusted input is bounded (64 MiB, depth, scalar, collection, alias). Tune or opt out via LoadOptions. See Untrusted input.

Support matrix

All of the following resolve under both rename paths and are covered by the swap harness and downstream smokes:

serde_yaml surfaceStatus
from_str / from_slice / from_readerCovered for typed config reads and Value
Deserializer::{from_str, from_slice, from_reader}Covered, incl. multi-document iteration
Value / Mapping / NumberCovered: reads, mutation, indexing, helpers, traits
value::{to_value, Serializer}Covered for config-shaped serialization
to_string / to_writer / SerializerStructural output covered; byte_compatible() matches bytes on the supported corpus
with::singleton_map / singleton_map_recursiveCovered for read and write
Error / Result / LocationCovered; richer diagnostics are additive

The indexing traits (Index, mapping::Index) are sealed, as they were upstream — use the built-in string / usize / Value lookups.

Proof

The migration claims are executable, not aspirational:

  • tests/serde_yaml_swap_harness.rs — the same call sites run against serde_yaml 0.9.34 and against this crate under the serde_yaml dependency name.
  • tests/downstream_migration_harness.rs and tests/external_downstream_migration.rs — pinned real-world configs and reduced fixtures from real serde_yaml users (Pingora, rust-i18n, cfn-guard, navi, Stackable).
  • scripts/downstream-build-trials.sh — packages this crate and builds those downstreams with their serde_yaml dependency rewritten to it.
cargo test --test serde_yaml_swap_harness --test downstream_migration_harness
cargo test --test external_downstream_migration
scripts/downstream-build-trials.sh smoke-only

Real-world gates currently cover 33 files / 39 documents across GitHub Actions, Docker Compose, Kubernetes, Helm, OpenAPI, Wrangler, Ansible, CloudFormation/SAM, Symfony, GitLab CI, CircleCI, and Azure Pipelines. They prove the selected corpus — not a substitute for testing your own YAML.

Migration impact ledger

AreaMigration impact
Default merge expansionLoaded Node/Value and Serde reads expand untagged and explicit merge-tag << by default. Code that inspected merge syntax should switch to parse_events or LosslessStream. Explicit !!str << and custom-tagged << stay literal.
YAML 1.1 compatibilityLegacy scalar/merge behavior is opt-in via schema modes; default entrypoints stay YAML 1.2-oriented, so corpora that need 1.1 typing need opt-in tests.
Alias graph identitySemantic trees clone acyclic aliases and reject recursion; graph-sensitive callers should use LosslessStream.
Lossless formattingComments, anchors, directives, and source style are preserved only by LosslessStream / ConfigEditor, not the semantic Value tree.
Parser acceptance differencesSome YAML 1.2 inputs libyaml rejects are accepted, and some malformed libyaml-tolerated inputs are rejected. Per-case detail lives in the divergence records.
Package statusCargo.toml declares saneyaml 0.3.0 under the MIT license.

Known follow-up

  • Keep the named external crate build trials current before broadening ecosystem replacement claims.
  • Keep divergence records and migration-impact wording current as behavior changes.
  • Treat full YAML compatibility and arbitrary source-preserving emission as future work until they are fixture-backed.

Compatibility

This is the exhaustive compatibility and divergence reference. For everyday use you don’t need it — see Schema modes, Untrusted input, and Migrating from serde_yaml.

What this crate targets

  • Primary API: serde_yaml read-side ergonomics for config-shaped YAML — from_str / from_slice / from_reader, plus Value and structural writes.
  • Parser reference: YAML 1.2 tree/event acceptance comparable to yaml-rust2 and saphyr for supported syntax.
  • Documented divergence: libyaml / YAML 1.1-era behavior is version-pinned against a Ruby Psych 3.1.0 / libyaml 0.2.1 probe. Default loading is YAML 1.2-oriented; YAML 1.1 typing is opt-in.

Every divergence record under tests/fixtures/divergences/records/ carries a migration_impact field, and tests/divergence_manifest.rs fails any record that omits caller-facing impact. That registry is the source of truth for intentional behavior splits.

serde_yaml 0.9 rename support matrix

“Supported” means the name resolves under both serde_yaml = { package = "saneyaml", ... } and use saneyaml as serde_yaml;. “Intentionally divergent” means it resolves but behaves differently by policy. “Not preservable” means it isn’t a stable surface this crate emulates.

serde_yaml 0.9 surfaceStatus
from_str, from_slice, from_readerSupported
from_value, to_valueSupported
to_string, to_writerSupported (byte-identical output is an opt-in tier)
Deserializer::{from_str,from_slice,from_reader}Supported, incl. multi-document iteration
Serializer::{new,flush,into_inner}Supported
Value, Sequence, Mapping, NumberSupported
value::*, mapping::*Supported
with::singleton_map, with::singleton_map_recursiveSupported
Default tag-style enum input/outputSupported
Error, Result, LocationSupported (richer diagnostics are additive)
Value merge-key retentionIntentionally divergent — loaded Value expands << by default; raw events / lossless preserve it
Default YAML 1.1 scalar constructionIntentionally divergent — default is YAML 1.2; use LoadOptions schema modes for legacy typing
Exact Number private representationNot preservable (public helpers kept; integers widened)
Downstream impl IndexNot preservable (sealed here, as upstream)
Byte-identical libyaml emitter outputNot preservable (writer is structural; bytes covered for the documented corpus)
Comments / anchors / graph identity in ValueNot preservable (use LosslessStream)

Reproducible loader matrix

Generated from tests/fixtures/compatibility-matrix/manifest.toml and checked by tests/compatibility_matrix.rs. Cross-ecosystem entries are pinned offline vectors; the Rust test validates their metadata and does not execute Go, Python, or C++ runtimes.

Behavior familyProof sourceyaml policyyamlserde_yamlyaml-rust2saphyrCross-ecosystem vectorDivergence / migration impact
Typed Serde config entrypointstests/compatibility_matrix.rs typed AppConfig probeYAML 1.2 default typed reads preserve common config scalars.acceptacceptn/an/an/aSerde-only Rust API row; parser-only loaders are intentionally marked n/a instead of given adapter shims.
Registered real-world fixturestests/fixtures/real-world/SOURCE.toml, 33 files / 39 documentsEvery registered fixture must parse with the three Rust reference loaders.acceptacceptacceptacceptn/aConfig migration smoke coverage includes CloudFormation/SAM, Symfony, GitLab CI, CircleCI, Azure Pipelines, GitHub Actions, Docker Compose, Kubernetes, Helm, OpenAPI, Wrangler, and Ansible without compatibility fallbacks.
CI expression and script scalarsGitHub Actions, CircleCI, and Azure Pipelines synthetic scalar shapesTreat CI expressions as plain or quoted strings under the default schema.acceptacceptacceptacceptgo-yaml gopkg.in/yaml.v3 v3.0.1: accept
PyYAML 6.0.2: accept
yaml-cpp 0.8.0: accept
CI users can migrate expression-heavy config without enabling YAML 1.1 compatibility or expression-specific parsing.
Anchors, aliases, and merge keysGitLab CI-style defaults and merge expansion fixtureSemantic loaders expand acyclic merge keys; raw/lossless surfaces preserve anchor and merge syntax.acceptacceptacceptacceptgo-yaml gopkg.in/yaml.v3 v3.0.1: accept
PyYAML 6.0.2: accept
yaml-cpp 0.8.0: accept
tests/fixtures/divergences/records/merge-keys.toml; Graph-sensitive callers should use lossless graph APIs; semantic config callers get effective merged mappings.
Application custom tagsCloudFormation/SAM and Symfony short-form tagsRetain application tags in this crate’s Value/event/lossless surfaces while allowing common loader acceptance.acceptacceptacceptacceptn/atests/fixtures/divergences/records/custom-tags.toml; Tagged config users should assert tag-retention behavior directly because some reference trees accept syntax while dropping or reshaping tag metadata.
Multi-document streamsKubernetes-style explicit document streamExplicit stream boundaries are accepted and document counts stay stable.acceptacceptacceptacceptgo-yaml gopkg.in/yaml.v3 v3.0.1: accept
PyYAML 6.0.2: accept
yaml-cpp 0.8.0: accept
Stream-processing callers should keep asserting document counts when migrating Kubernetes-style manifests.

Behavior by area

Areasaneyaml policylibyaml / YAML 1.1yaml-rust2 / saphyrserde_yaml
on, off, yes, noStrings by default; booleans only under explicit YAML 1.1Often booleansPer schemaData-model dependent
Duplicate keysRejected after alias expansion (1 and "1" are distinct keys)Often last-winsyaml-rust2 rejects some; saphyr accepts X38WRejects duplicate scalar keys
Merge key <<Expanded by default in loaded trees and Serde reads; raw events and LosslessStream keep it literal; Value::apply_merge() is an explicit helperExpanded, earlier merges winPreserved literallyLiteral in Value until apply_merge()
Anchors and aliasesSemantic trees clone acyclic aliases (no graph identity); LosslessStream keeps alias-to-anchor identitySometimes graph identityClone-on-aliasAccepted in read paths
Custom tagsRetained in Value/events/lossless; transparent for typed reads; %TAG handles resolved; undeclared handles rejectedSupportedSupportedPartial/lossy
Comments / formattingDiscarded by semantic loaders; retained by LosslessStream for byte-stable replay and editsNot semanticNot semanticDiscarded
Emissionstructural() is deterministic default; byte_compatible() matches serde_yaml bytes for the documented corpus; document-marker policy matches serde_yamlManual comparison onlyManual comparison onlyMarker policy matched; bytes for the supported corpus
Numbers / timestamps / binaryDecimals + underscores + special floats + 0123 (decimal) by default; octal/hex/binary/sexagesimal, !!timestamp, and !!binary under explicit YAML 1.1 or tagsBroad YAML 1.1 typingVaries by crateData-model dependent
Directives%YAML / %TAG accepted as syntax; yaml_version_directive() lets %YAML 1.1 pick legacy constructionMay affect schemaExposed by parserUsually not a value
Explicit core tags!!int, !!float, !!bool, !!null, !!str, !!timestamp, !!binary preserved and coerced for typed reads (verbatim, canonical URI, or %TAG handle)CommonVariesPartial/lossy
YAML 1.1 collection/structural tags!!set, !!omap, !!pairs, !!seq, !!map, !!value retained and mapped to typed targets; malformed payloads rejected with spansLossy recoveryTag info available; contracts differNot retained

Scalar Resolution Modes

Schema::Yaml12 is the retained spelling for Schema::Core (the default); Schema::Yaml11 is the retained spelling for Schema::LegacySerdeYaml. Schema::Json resolves only JSON lowercase booleans/null and JSON numbers, then keeps other scalar text as strings. Schema::Failsafe keeps every scalar a string. Missing mapping values and empty documents are null in every mode.

Plain scalarCore / Yaml12JsonFailsafeLegacySerdeYaml / Yaml11
missing valuenullnullnullnull
~nullstringstringnull
null, Null, NULLnullnull only is null; other spellings stringstringnull
true, falseboolboolstringbool
True, TRUE, False, FALSEboolstringstringbool
yes, no, on, off, y, n, NOstringstringstringbool
123, +12, 0123, 1_000decimal numberJSON number only; +12, 0123, and underscores stringstringnumber; 0123 is octal
0x7B, 0b1010, 0o77stringstringstringhex and binary numbers; 0o77 string
1:20, 1:20:30.5stringstringstringsexagesimal number
1.5floatJSON floatstringfloat
.inf, -.Inf, .NANfloatstringstringfloat
2026-05-24, timestamp datetimesstringstringstringretained !!timestamp string with Timestamp typed reads

Tree-shape divergences

A few YAML 1.2 inputs parse fine as events but yield tree-shape divergences in the loaded tree, where reference loaders disagree. saneyaml keeps these in event parity and shared-reference acceptance, excludes them from loaded-tree value-shape parity, and pins a divergence record for each:

  • PW8X and 6KGN — anchors on empty scalar nodes.
  • S4JQ — an explicit non-specific tag shape on an empty node.
  • C4HZ — a custom tag plus a schema scalar divergence.
  • FH7Jtags on empty scalar nodes.

tests/parity_manifest.rs gates these terms and the event/tree/shared-reference ledgers; cargo test --test conformance_dashboard -- --nocapture prints the full 402-case selected-suite dashboard with divergence overlays.

Event and streaming contracts

  • EventStream is the stable pull-based parser-event surface; parse_events is the all-or-error collector over the same events. Events carry scalar style, block/flow collection style, tags, anchors, alias events, and DocumentStart directives. Aliases are not expanded here.
  • DocumentStream is the semantic pull stream — one merge-expanded Node per document, same schema/limits/spans as parse_documents.
  • Reader constructors fully buffer bounded input before yielding, so streaming bounds the retained parsed representation, not the source bytes.

Raw scalar spelling and graph identity are not exposed by events; recover them with LosslessStream::source_fragment(span) and the lossless graph.

Threat Model and Resource Guarantees

The defended input is untrusted YAML at every load entrypoint. With default LoadOptions, the crate rejects:

  • input above 64 MiB before parsing,
  • alias-expansion bombs (input-derived budget) and recursive aliases,
  • nesting beyond 128, scalars above 1 MiB, and collections above 16,384 entries,

with span-bearing diagnostics. Raw event and lossless streams validate alias references but don’t expand them. Callers can tighten or relax each limit through LoadOptions; a without_*_limit() opt-out transfers that bound to the caller.

These are structural construction limits, not wall-clock or resident-memory guarantees: reader entrypoints fully buffer bounded input, and your own Deserialize impls can allocate after the YAML layer hands them bounded values. saneyaml validates YAML structure, not application schemas. See Untrusted input for the how-to and SECURITY.md for reporting.

Public API stability

The crate is pre-1.0 (MSRV Rust 1.88), but the preview surface is SemVer-visible: public exports, enum variants, struct fields, constants, and the package-vs-library name split are commitments. docs/PUBLIC_API.txt is the committed snapshot checked for drift; intentional changes must update it along with this file and MIGRATION.md.

saneyaml::Error keeps a flat Display compatible with the preview contract and exposes additive category(), path(), document_index(), and render_source(...) diagnostics. saneyaml::Index and saneyaml::mapping::Index are sealed.

Architecture

Package and Library Names

The crates.io package name and the Rust library target are both saneyaml, so downstream code imports this crate as saneyaml::...:

[dependencies]
saneyaml = "0.3.0"

For drop-in serde_yaml migration, Cargo dependency renaming keeps existing source imports intact:

[dependencies]
serde_yaml = { package = "saneyaml", version = "0.3.0" }

or a local source alias:

#![allow(unused)]
fn main() {
use saneyaml as serde_yaml;
}

Monolith Decision

saneyaml is one crate for the first public release. It is not a crate family and does not publish separate yaml-core, yaml-value, yaml-serde, or yaml-edit packages.

The parser, tree model, deserializer, emitter, and lossless source model are tightly coupled today. In particular, parse.rs and de.rs share reader ingestion, limits, schema construction, merge behavior, and span diagnostics. Splitting those seams before real downstream adoption would create compatibility and versioning surfaces without reducing meaningful user complexity.

The monolith keeps one SemVer contract, one feature map, one diagnostics model, and one package alias story for serde_yaml migration.

Feature Facade

The only optional crate feature currently exposed is:

default = ["lossless"]
lossless = []

lossless controls source-backed graph inspection and format-preserving edit helpers. Serde integration, Value, structural writers, and emitter controls are always part of the package contract because serde is a runtime dependency and the migration surface is not currently split into optional subfeatures. Future feature narrowing should add real cfg(feature = "...") boundaries, tests, and documentation instead of naming facade features that do not alter compiled API.

Stability Boundary

The crate is pre-1.0. Public exports, public enum variants, public struct fields, feature names, package metadata, MSRV, and the package-vs-library name split are still SemVer-visible for adopter trust. Intentional changes to those surfaces must update docs/PUBLIC_API.txt, docs/MIGRATION.md, docs/COMPATIBILITY.md, and the baseline evidence rather than relying on silent drift.

Real-World Config Benchmarks And Large Inputs

The benchmark examples parse checked-in or generated YAML without timing file I/O. They report aggregate cost so small files do not dominate the signal. These benchmark and conformance commands are source-checkout-only: the published crate package ships this document, but it intentionally excludes the dev-dependency examples and fixture corpora used to regenerate the tables.

cargo run --locked --release --example real_world_benchmark
YAML_BENCH_ITERS=1000 cargo run --locked --release --example real_world_benchmark
cargo run --locked --release --example large_input_benchmark
YAML_LARGE_BENCH_ITERS=20 cargo run --locked --release --example large_input_benchmark

Environment for the latest captured run:

  • Reference crates: serde-saphyr 0.0.27 with deserialize only, yaml-rust2 0.11.0, saphyr 0.0.6
  • Small fixture set: 33 files / 39 YAML documents / 25,362 bytes
  • Large fixture set: pinned downstream fixtures plus generated 1 MiB inputs
  • Captured: 2026-06-06 with Cargo’s release profile and --locked

The linked serde-saphyr repository was ahead of crates.io at the time of this capture (0.0.28 in Git, latest published 0.0.27). The benchmark pins the published crate so the checked-in Cargo.lock and package checks remain registry-reproducible.

The serde-saphyr rows use benchmark options rather than the crate defaults: strict_booleans: true plus relaxed event, alias, document, node, scalar, and merge budgets so the generated corpora are comparable throughput inputs. Because serde-saphyr does not expose a native YAML value tree, the matched generic Serde lane deserializes both libraries into serde_yaml::Value. The preflight normalizes two public-contract differences before asserting equality: serde-saphyr::from_multiple_with_options skips empty/null-like documents, and serde-saphyr treats YAML tags as transparent for this target while saneyaml preserves them.

The README overview graphic is a static summary of selected benchmark and feature rows. Its source notes and update checklist live at docs/assets/saneyaml-overview.md; update that note with this file whenever the graphic changes.

The large benchmark’s peak retained bytes and peak retained heap objects columns are safe retained-output estimates from parsed tree container and string capacities after a single parse. They are not allocator instrumentation and do not include transient parser scratch. For multi-fixture corpora, they report the peak retained output for one fixture because each fixture is parsed and dropped independently.

The 2026-06-01 zero-copy line slice removed transient per-line raw/content text allocations from this crate’s parser by storing one resident source buffer and per-line byte ranges. That allocation drop is visible in the parser code path rather than in the retained-output columns, because preprocessed lines are dropped before the parsed tree is returned.

The 2026-06-01 no-merge fast path records whether each parsed document contains a semantic merge key and skips the post-parse merge traversal when none was seen. In the same-session target capture, generated_multi_doc_stream_1mib saneyaml::parse_documents moved from 25.87 to 23.98 ns/byte while saphyr moved from 24.86 to 24.89 ns/byte. Retained output estimates are unchanged because the returned tree shape is unchanged; the removed work is transient per-document merge scanning and its scratch stack.

The 2026-06-01 plain-scalar continuation slice delays String allocation until a plain scalar is proven to span multiple lines. In the next same-session target capture, generated_multi_doc_stream_1mib saneyaml::parse_documents moved from 23.98 to 21.87 ns/byte while saphyr measured 24.42 ns/byte. Retained output estimates are unchanged because single-line scalar output is identical; the removed work is transient short String allocation before the scalar falls back to the inline parse path.

The 2026-06-01 retained-capacity slice trims completed document, sequence, and mapping vectors before returning parsed trees. In the large-input capture below, saneyaml::parse_documents retained bytes moved from 703,340 to 486,188 on the Stackable peak, from 23,031,972 to 13,006,500 on the generated multi-document stream, and from 13,040,211 to 9,893,619 on the generated 1 MiB wide mapping. The same-run speed lead over saphyr remains intact for the default spanful parser on every large-input row.

The same milestone adds saneyaml::parse_borrowed_documents, an explicit spanless retained tree that can borrow scalar strings from the caller’s input buffer. This is an additive load path, not a silent change to saneyaml::parse_documents; the retained-output estimate counts the borrowed tree heap and, like the saphyr row, does not count the caller-owned source buffer. That row closes the retained-memory axis against saphyr 0.0.6 across the large-input corpus while preserving the owning parser’s spans and scalar-source behavior.

Real-World Config Corpus

Latest same-run capture after adding the matched serde-saphyr lane, using YAML_BENCH_ITERS=1000:

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/byte
saneyaml::parse_documents1,00025,36239431.31417.01
saneyaml::from_documents_str::<Value>1,00025,36239572.47722.57
saneyaml::from_documents_str::<serde_yaml::Value>1,00025,36239578.39922.81
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value>1,00025,362391,165.02745.94
serde_saphyr::from_multiple_with_options::<serde_yaml::Value>1,00025,362391,055.33241.61
serde_yaml::Value stream1,00025,36239644.63825.42
yaml_rust2::YamlLoader1,00025,36239539.89421.29
saphyr::Yaml::load_from_str1,00025,36239502.28819.80

On this corpus, the matched generic Serde value lane measured saneyaml at 22.81 ns/byte versus serde-saphyr at 41.61 ns/byte. The private event-backed prototype measured 45.94 ns/byte on the same target, so it is not a replacement for the tree-backed Serde path yet. The raw tree-load rows are shown for context but are a different contract from serde-saphyr’s Serde-only API.

Same-turn pre-optimization baseline, captured before this milestone with the default 200 iterations:

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/byte
saneyaml::parse_documents20019,72733126.37732.03
saneyaml::from_documents_str::<Value>20019,72733141.21135.79
serde_yaml::Value stream20019,72733153.46538.90
yaml_rust2::YamlLoader20019,72733100.95925.59
saphyr::Yaml::load_from_str20019,7273392.39323.42

Post zero-copy line-slice re-capture with 1,000 iterations (independent run, 2026-06-01):

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/byte
saneyaml::parse_documents1,00019,72733355.27118.01
saneyaml::from_documents_str::<Value>1,00019,72733422.61821.42
serde_yaml::Value stream1,00019,72733523.39026.53
yaml_rust2::YamlLoader1,00019,72733434.22222.01
saphyr::Yaml::load_from_str1,00019,72733402.90920.42

Result: after the zero-copy line slice, saneyaml::parse_documents was faster than the pinned reference loaders on this small corpus in that 2026-06-01 1,000-iteration same-run capture (18.01 ns/byte vs saphyr at 20.42 and yaml_rust2 at 22.01). The owning Value path also remains ahead of the serde_yaml Value stream and roughly ties yaml_rust2 on this corpus.

Post no-merge and plain-scalar fast-path re-capture with the default 200 iterations (independent run, 2026-06-01):

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/byte
saneyaml::parse_documents20019,7273372.59818.40
saneyaml::from_documents_str::<Value>20019,7273384.58521.44
serde_yaml::Value stream20019,72733116.03129.41
yaml_rust2::YamlLoader20019,7273382.79220.98
saphyr::Yaml::load_from_str20019,7273378.60619.92

Methodology caveat: the pre-optimization table above was captured at 200 iterations and the post-optimization table at 1,000, so part of the across-table ns/byte drop reflects warm-up rather than optimization. The trustworthy signal is the same-run cross-loader comparison within each table, plus the larger, lower-noise inputs below — not the across-table delta.

Large Inputs

Command:

cargo run --locked --release --example large_input_benchmark

Default iterations: 20, controlled by YAML_LARGE_BENCH_ITERS.

external_downstream_all

20 pinned downstream files / 245,062 bytes / 20 YAML documents.

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/bytepeak retained bytespeak retained heap objects
saneyaml::parse_documents20245,0622031.8876.51486,1883,983
saneyaml::parse_borrowed_documents20245,0622030.2566.17173,556904
saneyaml::from_documents_str::<Value>20245,0622040.0358.17217,4833,780
saneyaml::from_documents_str::<serde_yaml::Value>20245,0622041.3338.43378,8433,780
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value>20245,0622077.03215.72396,9873,780
serde_saphyr::from_multiple_with_options::<serde_yaml::Value>20245,0622067.76313.83396,9873,780
serde_yaml::Value stream20245,0622051.01910.41396,9873,780
yaml_rust2::YamlLoader20245,0622038.4707.85382,4973,796
saphyr::Yaml::load_from_str20245,0622036.4197.43534,7863,780

stackable_dummy_cluster

One pinned Stackable CRD / 177,556 bytes / 1 YAML document.

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/bytepeak retained bytespeak retained heap objects
saneyaml::parse_documents20177,556120.5705.79486,1883,983
saneyaml::parse_borrowed_documents20177,556119.1595.40173,556904
saneyaml::from_documents_str::<Value>20177,556124.5246.91217,4833,780
saneyaml::from_documents_str::<serde_yaml::Value>20177,556125.2037.10378,8433,780
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value>20177,556147.65713.42396,9873,780
serde_saphyr::from_multiple_with_options::<serde_yaml::Value>20177,556141.44511.67396,9873,780
serde_yaml::Value stream20177,556132.3439.11396,9873,780
yaml_rust2::YamlLoader20177,556124.2456.83382,4973,796
saphyr::Yaml::load_from_str20177,556123.0956.50534,7863,780

generated_multi_doc_stream_1mib

Generated multi-document service stream / 1,048,680 bytes / 8,020 YAML documents.

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/bytepeak retained bytespeak retained heap objects
saneyaml::parse_documents201,048,6808,020367.32117.5113,006,500128,321
saneyaml::parse_borrowed_documents201,048,6808,020361.22917.224,106,24032,081
saneyaml::from_documents_str::<Value>201,048,6808,020495.59223.634,729,860112,281
saneyaml::from_documents_str::<serde_yaml::Value>201,048,6808,020538.15525.669,862,660112,281
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value>201,048,6808,0201,116.47153.2311,607,364112,281
serde_saphyr::from_multiple_with_options::<serde_yaml::Value>201,048,6808,0201,223.59758.3411,607,364112,281
serde_yaml::Value stream201,048,6808,020661.46431.5411,607,364112,281
yaml_rust2::YamlLoader201,048,6808,020545.25326.0010,386,948112,281
saphyr::Yaml::load_from_str201,048,6808,020534.70025.4914,770,560112,281

generated_wide_mapping_256kib

Generated one-document wide service mapping / 262,176 bytes.

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/bytepeak retained bytespeak retained heap objects
saneyaml::parse_documents20262,176178.22614.922,484,77523,932
saneyaml::parse_borrowed_documents20262,176173.89814.09765,7922,994
saneyaml::from_documents_str::<Value>20262,1761107.35920.47938,23617,950
saneyaml::from_documents_str::<serde_yaml::Value>20262,1761106.08520.231,895,47617,950
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value>20262,1761223.81542.681,895,69217,950
serde_saphyr::from_multiple_with_options::<serde_yaml::Value>20262,1761212.83140.591,895,69217,950
serde_yaml::Value stream20262,1761114.75821.891,895,69217,950
yaml_rust2::YamlLoader20262,1761102.76219.601,704,22017,950
saphyr::Yaml::load_from_str20262,176197.24518.552,393,31217,950

generated_wide_mapping_1mib

Generated one-document wide service mapping / 1,048,661 bytes.

parser/load pathiterationsbytes per iterationdocs per iterationelapsed msns/bytepeak retained bytespeak retained heap objects
saneyaml::parse_documents201,048,6611309.17214.749,893,61995,236
saneyaml::parse_borrowed_documents201,048,6611282.01313.453,047,52011,907
saneyaml::from_documents_str::<Value>201,048,6611419.43520.003,739,05971,428
saneyaml::from_documents_str::<serde_yaml::Value>201,048,6611417.00319.887,548,45971,428
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value>201,048,6611878.51241.897,548,67571,428
serde_saphyr::from_multiple_with_options::<serde_yaml::Value>201,048,6611843.63840.227,548,67571,428
serde_yaml::Value stream201,048,6611477.81722.787,548,67571,428
yaml_rust2::YamlLoader201,048,6611420.14220.036,786,77171,428
saphyr::Yaml::load_from_str201,048,6611401.29919.139,523,71271,428

Large-input story: after zero-copy line storage, the no-merge fast path, delayed plain-scalar continuation allocation, and retained vector right-sizing, saneyaml::parse_documents beats yaml_rust2 and saphyr on every large parser path in the latest capture on an unloaded machine. In the matched Serde value lane, saneyaml::from_documents_str::<serde_yaml::Value> is faster than serde_saphyr::from_multiple_with_options::<serde_yaml::Value> on every large-input row. The hidden event-backed Serde prototype only wins against serde-saphyr on the generated multi-document stream in this capture and remains slower on the other large rows; it also retains the same serde_yaml::Value output shape, so it does not improve retained output memory yet. The smallest corpus (external_downstream_all) is the most contention-sensitive, so its ordering is the first to wobble under load; the larger corpora hold a clearer margin. The retained-memory story is now split by output contract: the default spanful tree keeps spans and scalar-source spellings and is faster than saphyr, while the additive saneyaml::parse_borrowed_documents tree drops spans/source spellings and borrows sliceable scalars, retaining less heap than saphyr on every large-input row (for example, 3,047,520 vs 9,523,712 bytes on the 1 MiB wide mapping).

Streaming And Compact Line-Table Milestone

This milestone adds a compact per-line table, a fused line-preprocessing scan, source-backed borrowed scalars, and a lazy streaming line buffer that reclaims consumed lines as DocumentStream/EventStream advance. Batch loaders keep an eager line table for speed; only the streaming entrypoints reclaim. The input string itself stays fully resident, so these are bounded-retention streaming paths, not constant-memory readers.

Captured in a single release session against the in-repo harnesses:

YAML_BENCH_ITERS=1000 cargo run --locked --release --example real_world_benchmark
cargo run --locked --release --example dhat_memory -- --all
cargo run --locked --release --example conformance_compare

Real-world config corpus (1,000 iterations)

33 files / 39 YAML documents / 25,362 bytes. This is a distinct, later capture from a separate same-session run of this milestone, not the same measurement as the “Real-World Config Corpus” table above; the corpus is identical but the per-loader ns/byte figures differ run to run (for example saphyr reads 21.42 here versus 19.80 there), which is the run-to-run noise the methodology caveat describes.

parser/load pathns/byte
saneyaml::parse_documents15.03
saneyaml::from_documents_str::<Value>21.19
saphyr::Yaml::load_from_str21.42
yaml_rust2::YamlLoader23.11
serde_yaml::Value stream24.98

On this corpus saneyaml::parse_documents is the fastest load path; the owning Value path ties saphyr and stays ahead of yaml_rust2 and serde_yaml.

Allocator-backed memory (dhat), 1 MiB multi-document stream

8,020 documents. retained blocks is the count of live allocations still held while the parsed output is retained.

pathallocsbytes allocatedpeakretained blocks
saneyaml stream docs184,46613.64 MB2.10 MB4
saneyaml stream events232,59449.28 MB2.11 MB6
saneyaml borrowed80,21917.29 MB6.21 MB32,081
saneyaml owned200,51916.05 MB15.12 MB128,321
saneyaml Value449,14025.07 MB15.12 MB112,281
yaml-rust2585,47829.29 MB17.15 MB192,481
saneyaml as serde_yaml::Value465,18039.73 MB20.79 MB136,341
saneyaml event-backed as serde_yaml::Value1,114,806175.38 MB22.83 MB136,341
serde-saphyr as serde_yaml::Value577,46559.71 MB21.79 MB136,344
serde_yaml721,82184.73 MB21.84 MB136,341
saphyr216,55922.77 MB22.30 MB192,481

On a multi-document stream the streaming loaders hold a bounded working set (retained blocks stay at 4–6 regardless of stream length) and post the lowest peak; the borrowed batch tree has the lowest peak among the non-streaming loaders. The event-backed Serde prototype is allocation-heavy here because it still consumes parser-recorded event frames rather than a direct parser-to-Serde stream.

Allocator-backed memory (dhat), 1 MiB wide single document

pathpeakretained blocks
serde-saphyr as serde_yaml::Value10.73 MB83,337
yaml-rust210.98 MB130,951
saphyr14.10 MB130,951
saneyaml borrowed15.32 MB11,907
saneyaml owned16.16 MB95,236
saneyaml stream docs16.16 MB4
saneyaml Value16.39 MB71,428
saneyaml as serde_yaml::Value19.91 MB83,334
serde_yaml23.42 MB83,334
saneyaml stream events62.22 MB6
saneyaml event-backed as serde_yaml::Value78.54 MB83,334

Streaming only helps when there are document boundaries to reclaim at. On a single wide document there is nothing to reclaim mid-parse, so yaml-rust2 and saphyr post lower peaks than saneyaml on this shape, and the event-streaming path is expensive here because it buffers per-event output for one large document. The matched serde-saphyr value row posts a low wide-document peak, while the event-backed Serde prototype is the highest peak in this capture. Streaming is a multi-document memory win, not a universal one.

Conformance (402 curated cases)

libraryspec accept/reject (400)tree policy (2)
saneyaml400/4002/2
yaml-rust2400/4002/2
saphyr400/4000/2
serde_yaml333/4002/2

saneyaml ties yaml-rust2 and saphyr at 400/400 on the neutral spec set; it is not a sole leader there. Its differentiation is the combination of full spec conformance with tree-policy rejection of the duplicate-key/tree-error cases that saphyr accepts, while serde_yaml trails the spec set at 333/400.

Reproduction & Tooling

Every number in this document comes from an in-repo example, run under Cargo’s release profile. The commands below regenerate each captured table from a source checkout of this repository; absolute values vary by machine, but the same-run cross-loader ordering is the trustworthy signal on an otherwise-idle machine. The harness is a hand-rolled Instant::now() loop with no warm-up or statistics, so under heavy machine load even that ordering can invert; treat any single capture as indicative rather than authoritative.

captured sectioncheckout-only command
Real-World Config Corpuscargo run --locked --release --example real_world_benchmark
Real-world corpus (1,000 iterations)YAML_BENCH_ITERS=1000 cargo run --locked --release --example real_world_benchmark
Large Inputs (all corpora)cargo run --locked --release --example large_input_benchmark
Large Inputs (custom iteration count)YAML_LARGE_BENCH_ITERS=20 cargo run --locked --release --example large_input_benchmark
Allocator-backed memory (dhat)cargo run --locked --release --example dhat_memory -- --all
dhat single (library, corpus) paircargo run --locked --release --example dhat_memory -- saneyaml-borrowed multidoc
Conformance (402 curated cases)cargo run --locked --release --example conformance_compare

Iteration counts default to 200 for real_world_benchmark (YAML_BENCH_ITERS) and 20 for large_input_benchmark (YAML_LARGE_BENCH_ITERS). The dhat_memory example installs a global allocator and must measure one library per process; -- --all sweeps every (library, corpus) pair for you, and -- <library> <corpus> profiles a single pair.

Reference-crate versions

The captured comparison numbers were produced against these pinned dev-dependency versions (see Cargo.toml):

crateversion
serde-saphyr0.0.27 (default-features = false, deserialize)
serde_yaml0.9.34
saphyr0.0.6
saphyr-parser0.0.6
yaml-rust20.11.0
dhat0.3.3

To reproduce against the exact pinned set, build with the checked-in Cargo.lock (the default for cargo run). Bumping any reference crate can shift its numbers, so re-capture the whole comparison table when upgrading.