saneyaml documentation
Serde-first YAML for Rust, with real YAML 1.2 semantics. Read this as a hosted guide at https://jskoiz.github.io/saneyaml/, or start with Getting started in the repo and open a topic page when you hit it.
Learn the basics
- Getting started — install, parse into a struct, emit. ~5 minutes.
- Cookbook — copy-paste recipes for the common tasks.
Topic guides
- Schema modes — YAML 1.2 vs 1.1, and why
NOstays the string"NO". - Diagnostics — line/column, key paths, and source carets in errors.
- Untrusted input — resource limits for hostile YAML.
- Editing files — change values in place without losing comments, anchors, or ordering.
- Streaming — pull events/documents with bounded memory.
Migrating
- From serde_yaml — drop-in alias and a call-site cookbook.
Reference
- Compatibility — scalar resolution table, divergences, threat model.
- Architecture — crate layout and design decisions.
- Benchmarks — throughput and memory vs other crates.
- API reference (docs.rs) — full generated docs.
Project
Changelog · Security policy · Contributing · License
Getting started
Install, parse YAML into your own types, and emit it back. About five minutes.
Snippets below elide the enclosing function and use
?; assume each runs in a function returningsaneyaml::Result<()>.
Install
[dependencies]
saneyaml = "0.3.0"
serde = { version = "1", features = ["derive"] }
Pure Rust, no C bindings, #![forbid(unsafe_code)]. MSRV 1.88.
Parse into a struct
The common case — load a config straight into your own types with
#[derive(Deserialize)]:
#![allow(unused)]
fn main() {
use serde::Deserialize;
#[derive(Deserialize)]
struct Config {
name: String,
port: u16,
tags: Vec<String>,
}
let cfg: Config = saneyaml::from_str("\
name: web
port: 8080
tags: [http, public]
")?;
assert_eq!(cfg.port, 8080);
assert_eq!(cfg.tags, ["http", "public"]);
}
Three entry points, same behavior — pick by what you hold:
#![allow(unused)]
fn main() {
let a: Config = saneyaml::from_str(text)?; // &str
let b: Config = saneyaml::from_slice(bytes)?; // &[u8]
let c: Config = saneyaml::from_reader(file)?; // impl std::io::Read
}
Parse into a dynamic value
When you don’t know the shape ahead of time, deserialize into Value and walk
it:
#![allow(unused)]
fn main() {
let v: saneyaml::Value = saneyaml::from_str("name: web\nport: 8080\n")?;
assert_eq!(v["name"].as_str(), Some("web"));
assert_eq!(v["port"].as_u64(), Some(8080));
}
Value mirrors serde_yaml::Value: as_str, as_i64, as_bool,
as_sequence, as_mapping, indexing by key or position, and get for a
non-panicking lookup. See the Cookbook for
mutation and patching.
Emit
Serialize any Serialize value to YAML:
#![allow(unused)]
fn main() {
use serde::Serialize;
#[derive(Serialize)]
struct Config { name: String, port: u16 }
let cfg = Config { name: "web".into(), port: 8080 };
let text = saneyaml::to_string(&cfg)?; // -> String
// name: web
// port: 8080
saneyaml::to_writer(std::io::stdout(), &cfg)?; // writes to any impl std::io::Write
}
The default writer produces clean, deterministic YAML. To tune layout — sort
keys, force quoting, flow vs block — use EmitOptions and
to_string_with_options; see the Cookbook.
Entry-point cheat sheet
| You have… | You want… | Call |
|---|---|---|
&str / &[u8] / reader | one typed value | from_str / from_slice / from_reader |
| a multi-document stream | a Vec<T> | from_documents_str / _slice / _reader |
| a multi-document stream | one document at a time | Deserializer::from_str(...), then iterate |
a Serialize value | a String or writer | to_string / to_writer |
| YAML text | the raw structure | parse_str → Node, or read into Value |
| non-default schema or limits | any of the above | LoadOptions::…().from_str(...) |
Where to next
- Cookbook — multi-document streams, enums, anchors,
Valuepatching, emitter options. - Schema modes — control how scalars like
NO,on, and0123resolve. - Migrating from serde_yaml — if you’re replacing it.
Cookbook
Short, copy-paste recipes for the tasks that come up most. For the basics, read Getting started first.
Snippets elide the enclosing function and use
?; assume each runs in a function returningsaneyaml::Result<()>.
- Multi-document streams
- Work with
Value - Enums and singleton maps
- Anchors and merge keys
- Numbers, timestamps, and binary
- Control emitted YAML
- Custom tags
Multi-document streams
Kubernetes-style ----separated streams. Get everything at once:
#![allow(unused)]
fn main() {
let stream = "\
kind: Service
---
kind: Deployment
";
let docs: Vec<Manifest> = saneyaml::from_documents_str(stream)?;
assert_eq!(docs.len(), 2);
}
Or process one document at a time — useful when an early document is valid but a later one fails, and you want the good ones first:
#![allow(unused)]
fn main() {
use serde::Deserialize;
for doc in saneyaml::Deserializer::from_str(stream) {
let manifest = Manifest::deserialize(doc)?;
// handle manifest…
}
}
from_documents_str is all-or-error; the iterator yields parsed documents up to
the first error. For bounded memory on large streams, see
Streaming.
Work with Value
Read, mutate, and patch a dynamic document. Indexing returns a null sentinel for missing paths, so chains don’t panic:
#![allow(unused)]
fn main() {
let mut v: saneyaml::Value = saneyaml::from_str("\
services:
api:
image: nginx:1.25
ports: [80]
")?;
// read
assert_eq!(v["services"]["api"]["image"].as_str(), Some("nginx:1.25"));
// patch in place
v["services"]["api"]["image"] = saneyaml::Value::from("nginx:1.27");
// mutate a sequence
if let Some(ports) = v["services"]["api"]["ports"].as_sequence_mut() {
ports.push(saneyaml::Value::from(443));
}
}
Value::from accepts strings, bools, every integer/float width, Mapping, and
Vec<Value>. Build maps with Mapping, which keeps insertion order and offers
the full entry / get / insert / remove API.
To convert between typed values and Value without going through text, use
to_value and from_value:
#![allow(unused)]
fn main() {
let value = saneyaml::to_value(&cfg)?; // T -> Value
let cfg: Config = saneyaml::from_value(value)?; // Value -> T
}
from_valueis spanless: it won’t coerce a number or bool into aStringtarget, because the original spelling is gone after the value is built. Read withfrom_str/from_slicewhen a string field must preserve source text like1_000orFALSE.
Enums and singleton maps
Data-carrying enums round-trip as YAML tags by default (!Variant value). For
the serde_yaml single-key-map shape (variant: value), annotate the field:
#![allow(unused)]
fn main() {
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize)]
enum Action { Run(String), Skip }
#[derive(Serialize, Deserialize)]
struct Job {
#[serde(with = "saneyaml::with::singleton_map")]
action: Action,
}
}
Use saneyaml::with::singleton_map_recursive when nested enum payloads also need
the one-entry-map shape. To emit every enum as a singleton map without
annotating each field, set the emitter option globally:
#![allow(unused)]
fn main() {
use saneyaml::{EmitOptions, EnumRepresentation};
let opts = EmitOptions::structural()
.with_enum_representation(EnumRepresentation::SingletonMap);
let text = saneyaml::to_string_with_options(&job, opts)?;
}
Anchors and merge keys
&anchor / *alias and the << merge key are expanded for you when loading
into Node or Value — you read the effective, merged result:
#![allow(unused)]
fn main() {
let v: saneyaml::Value = saneyaml::from_str("\
defaults: &defaults
retries: 3
service:
<<: *defaults
name: api
")?;
assert_eq!(v["service"]["retries"].as_u64(), Some(3));
assert_eq!(v["service"]["name"].as_str(), Some("api"));
}
Explicit keys override merged ones, and earlier entries in a merge list win.
Value::apply_merge() is also available as an explicit in-place helper.
If you need the raw << syntax and anchor/alias graph identity (not the
expanded result), parse with parse_events / EventStream or
the lossless graph.
Numbers, timestamps, and binary
Number widens to i128 / u128; the usual helpers are range-checked:
#![allow(unused)]
fn main() {
let v: saneyaml::Value = saneyaml::from_str("count: 9000000000000000000\n")?;
assert_eq!(v["count"].as_u64(), Some(9_000_000_000_000_000_000));
}
Timestamps and !!binary are YAML 1.1 features and need an explicit schema or
tag. !!timestamp scalars are read via as_timestamp() / typed
saneyaml::Timestamp fields; !!binary decodes into byte targets like
Vec<u8>. See Schema modes for enabling YAML 1.1 typing.
Control emitted YAML
The default (EmitOptions::structural()) is insertion-order, plain-where-safe,
block layout. Tune it with builder methods:
#![allow(unused)]
fn main() {
use saneyaml::{EmitOptions, KeyOrder, ScalarQuoteStyle};
let opts = EmitOptions::structural()
.with_key_order(KeyOrder::Sort) // Preserve | Sort
.with_scalar_quote_style(ScalarQuoteStyle::DoubleQuoted); // PlainWhereSafe | SingleQuoted | DoubleQuoted
let text = saneyaml::to_string_with_options(&cfg, opts)?;
}
Other knobs: with_collection_style (Block | Flow), with_block_scalar_style
(Literal | Folded), with_enum_representation (Tag | SingletonMap), and
with_yaml_1_1_safe_strings(true) to quote strings like no / 12:34:56 so
older YAML 1.1 readers don’t reinterpret them.
For byte-for-byte serde_yaml writer output on the supported structural corpus,
use EmitOptions::byte_compatible(). Comments and source formatting are not a
writer concern — use in-place editing to preserve them.
Custom tags
Application tags (!Ref, !Env, CloudFormation intrinsics, …) are preserved in
Value and visible to enum dispatch. For ordinary typed reads they’re
transparent metadata:
#![allow(unused)]
fn main() {
// !Env prod -> String "prod"
// !Ports [80, 443] -> Vec<u16>
// !Maybe null -> Option<T> (None)
}
Inspect a tag explicitly with value.as_tagged(), which exposes the tag and the
inner Value.
Schema modes
A schema decides how a plain scalar like NO, on, or 0123 becomes a typed
value. saneyaml defaults to YAML 1.2, where those stay strings — so you don’t
get the “Norway problem.”
The Norway problem
In YAML 1.1 (and the archived serde_yaml), NO resolves to the boolean
false. A country-code list [NO, SE, FI] silently becomes [false, SE, FI].
YAML 1.2 fixed this: only true / false are booleans. saneyaml follows 1.2 by
default.
#![allow(unused)]
fn main() {
let codes: Vec<String> = saneyaml::from_str("[NO, SE, FI]")?;
assert_eq!(codes, ["NO", "SE", "FI"]); // strings, as written
}
Default resolution (YAML 1.2)
| Plain scalar | Resolves to |
|---|---|
null, Null, NULL, ~, missing value | null |
true, false, True, FALSE | bool |
yes, no, on, off, y, n, NO | string |
123, +12, 0123, 1_000 | integer (decimal — 0123 is 123) |
0x7B, 0b1010, 0o77 | string |
1.5, .inf, -.Inf, .NAN | float |
1:20, 1:20:30 | string |
2026-05-24, datetimes | string |
The full table for every mode is in COMPATIBILITY.md → Scalar resolution.
Choosing a mode
Pass a configured LoadOptions instead of calling from_str directly:
#![allow(unused)]
fn main() {
use saneyaml::LoadOptions;
let cfg: Config = LoadOptions::core().from_str(text)?; // YAML 1.2 (the default)
let cfg: Config = LoadOptions::json().from_str(text)?; // JSON booleans/null/numbers only
let cfg: Config = LoadOptions::failsafe().from_str(text)?; // every scalar stays a string
let cfg: Config = LoadOptions::legacy_serde_yaml().from_str(text)?; // YAML 1.1 / serde_yaml-style
}
| Mode | Use it when |
|---|---|
core() (default) | You want correct YAML 1.2 behavior. |
json() | Input is JSON-ish and you want strict JSON scalar typing. |
failsafe() | You want raw strings and will type things yourself. |
legacy_serde_yaml() / yaml_1_1() | You have an existing corpus that depends on no → false, octal 0123, sexagesimals, !!timestamp typing, etc. |
Schema::{Core, Json, Failsafe, LegacySerdeYaml} are the enum equivalents;
Schema::Yaml12 and Schema::Yaml11 are retained aliases for Core and
LegacySerdeYaml.
Per-document %YAML directives
To let each document’s %YAML header pick the mode — %YAML 1.1 gets legacy
construction, everything else stays 1.2 — use directive mode:
#![allow(unused)]
fn main() {
let docs = saneyaml::LoadOptions::yaml_version_directive().from_documents_str(stream)?;
}
What YAML 1.1 mode turns on
legacy_serde_yaml() / yaml_1_1() resolves the legacy forms: boolean words
(yes/no/on/off), octal 0123, 0x/0b radix integers, base-60
sexagesimals, and !!timestamp-shaped scalars (read via saneyaml::Timestamp).
Numeric spellings that overflow Number stay strings.
Even in YAML 1.2 mode you can still opt into individual YAML 1.1 types with
explicit tags (!!int 0o77, !!binary …, !!timestamp …) without switching the
whole schema. See COMPATIBILITY.md for the exact tag rules.
Diagnostics
Errors carry where and what — line/column, an in-document key path, and an opt-in source caret — so a bad config tells the user how to fix it.
Snippets elide the enclosing function; assume a function returning
saneyaml::Result<()>.
Location
Error::line() and column() mirror the common serde_yaml convenience path
(location() returns the same as a Location):
#![allow(unused)]
fn main() {
let err = saneyaml::from_str::<Config>("name: [\n").unwrap_err();
if let (Some(line), Some(col)) = (err.line(), err.column()) {
eprintln!("error at {line}:{col}");
}
}
Key path
path() reports where in the document the error is, using familiar traversal
syntax — server.port, ports[1], bracket-quoted non-identifier keys:
#![allow(unused)]
fn main() {
let err = saneyaml::from_str::<Config>("server:\n port: not-a-number\n").unwrap_err();
if let Some(path) = err.path() {
eprintln!("at {path}"); // at server.port
}
}
Source caret
render_source(input) returns a Display that points at the offending span,
rustc-style — great for CLI output:
#![allow(unused)]
fn main() {
let input = "name: web\nport: [\n";
let err = saneyaml::from_str::<Config>(input).unwrap_err();
eprintln!("{}", err.render_source(input));
// 2 | port: [
// | ^ …
}
Use render_source_with_options(input, SourceRenderOptions) to control the
number of context lines.
Categorize
category() returns an ErrorCategory for branching — e.g. distinguishing a
parse/syntax failure from a type mismatch. document_index() reports which
document in a stream failed (zero-based).
#![allow(unused)]
fn main() {
use saneyaml::ErrorCategory;
match err.category() {
ErrorCategory::Syntax => { /* malformed YAML */ }
ErrorCategory::Data => { /* type/shape mismatch */ }
_ => { /* Limit, Reference, DuplicateKey, Io, … */ }
}
}
What carries spans
| Source | Line/column? |
|---|---|
from_str / from_slice / from_node | yes |
Deserializer::from_str / from_slice (incl. stream items) | yes |
from_reader (after buffering) | yes |
from_value and direct Value reads | no — Value is spanless |
The flat Display string stays compatible with serde_yaml; everything above
is additive.
Untrusted input
YAML from the network, user uploads, or a CI job is hostile until proven otherwise. saneyaml applies structural resource limits by default, and lets you tune them per call site.
Snippets elide the enclosing function; assume a function returning
saneyaml::Result<()>.
Defaults
Every parser, loader, streaming, lossless, and Serde entry point enforces these out of the box:
| Limit | Default | Rejects |
|---|---|---|
| Input size | 64 MiB | Oversized payloads, before parsing |
| Nesting depth | 128 | Deeply nested block/flow bombs |
| Scalar size | 1 MiB | Single giant scalars |
| Collection size | 16,384 entries | Wide sequence/mapping bombs |
| Alias expansion | input-derived budget | Billion-laughs alias bombs |
| Recursive aliases | — | always rejected |
The defaults accept real-world config (Kubernetes CRDs, OpenAPI, Compose) while rejecting compact bombs that sit under the byte ceiling.
Tighten for a specific call
Lower a limit when you know your inputs are small:
#![allow(unused)]
fn main() {
use saneyaml::LoadOptions;
let cfg: Config = LoadOptions::new()
.max_input_bytes(256 * 1024) // cap at 256 KiB
.max_nesting_depth(32)
.max_collection_items(1_000)
.from_str(text)?;
}
All knobs: max_input_bytes, max_alias_expansion_nodes, max_nesting_depth,
max_scalar_bytes, max_collection_items.
Relax — only when you’ve bounded the source yourself
Each without_* opt-out transfers that part of the bound to you. Use them only
when the source is already trusted or size-checked upstream:
#![allow(unused)]
fn main() {
let node = saneyaml::LoadOptions::new()
.without_input_limit() // also: without_nesting_depth_limit,
.parse_str(local_file)?; // without_scalar_limit, without_collection_limit
}
What the limits are — and aren’t
These are structural construction limits, not wall-clock or resident-memory guarantees:
- Reader entry points fully buffer bounded input before parsing.
- Raw event/lossless streams validate alias references but don’t expand them, so they don’t spend the alias budget.
- Your own
Deserializeimpls can still allocate after the YAML layer hands them bounded values. - saneyaml validates YAML structure, not application schemas (Kubernetes, OpenAPI, …).
For the full threat model and reporting process, see COMPATIBILITY.md → Threat model and SECURITY.md.
Editing files
Change values in an existing YAML file while keeping every comment, anchor, blank line, and untouched byte exactly where it was. This is the part a load-then-re-emit round trip can’t do.
Snippets elide the enclosing function; assume a function returning
saneyaml::Result<()>.
Edit by path
saneyaml::edit opens a ConfigEditor. Address values by path, then finish:
#![allow(unused)]
fn main() {
let source = "\
service stack
services:
web:
image: nginx:1.25
ports:
- \"80:80\"
";
let mut editor = saneyaml::edit(source)?;
editor
.set(saneyaml::ConfigPath::keys(["services", "web", "image"]), "nginx:1.27")?
.push(saneyaml::ConfigPath::keys(["services", "web", "ports"]), "8080:80")?;
let edited = editor.finish()?;
assert!(edited.contains("# service stack")); // comment preserved
assert!(edited.contains("image: nginx:1.27")); // value updated
assert!(edited.contains("- 8080:80")); // item appended
}
Operations: set, insert, remove, rename, push (append to a sequence),
and insert_item (insert at an index). Each returns &mut Self, so chain them;
the editor reparses between operations so later paths see current source.
Addressing paths
#![allow(unused)]
fn main() {
use saneyaml::{ConfigPath, PathSegment};
// string keys (most common)
ConfigPath::keys(["metadata", "labels", "app"]);
// mixed keys and sequence indices
ConfigPath::new([
PathSegment::from("jobs"),
PathSegment::from("test"),
PathSegment::from("steps"),
PathSegment::from(0usize),
PathSegment::from("uses"),
]);
// JSON Pointer — handles keys containing "/" or "~"
ConfigPath::json_pointer("/metadata/labels/app.kubernetes.io~1name")?;
}
Read and write files directly
#![allow(unused)]
fn main() {
let mut editor = saneyaml::edit_file("compose.yaml")?;
editor.set(saneyaml::ConfigPath::keys(["version"]), "3.9")?;
editor.finish_to_file()?; // writes back to compose.yaml
}
Inspect without editing
Drop to LosslessStream when you need to read source-level detail — comments,
exact scalar spelling, anchor/alias graph identity — that the semantic Value
tree discards:
#![allow(unused)]
fn main() {
let stream = saneyaml::parse_lossless(source)?;
for comment in stream.comments() {
println!("{}", comment.text());
}
}
LosslessStream also exposes effective_mapping_entries(node) — the merged view
of a mapping with << provenance kept — and source_fragment(span) to recover
the original bytes for any node. It’s the surface for tools that must preserve or
analyze source, not just values.
A runnable end-to-end example (Docker Compose, Kubernetes, GitHub Actions) lives
in examples/config_refactor.rs.
Streaming
Pull-based iterators that process YAML without holding every parsed document at
once. Use them for large multi-document streams; for small configs,
from_str is simpler.
Snippets elide the enclosing function; assume a function returning
saneyaml::Result<()>.
Two levels
| Stream | Yields | Use when |
|---|---|---|
DocumentStream | one semantic Node per document | You want documents one at a time, with merge expansion and schema applied. |
EventStream | low-level parser events | You want raw structure — scalar style, flow vs block, anchors, the literal << — without building a tree. |
Both construct from &str, &[u8], or a reader, and both are plain Iterators.
Documents one at a time
#![allow(unused)]
fn main() {
for doc in saneyaml::stream::DocumentStream::from_str(stream)? {
let node = doc?; // saneyaml::Node
// handle one document, then it can be dropped before the next is parsed
}
}
Raw events
#![allow(unused)]
fn main() {
use saneyaml::Event;
for event in saneyaml::stream::EventStream::from_str(source)? {
match event? {
Event::Scalar { .. } => { /* value, with its style + tag */ }
Event::MappingStart { .. } => { /* … */ }
Event::Alias { .. } => { /* a raw *alias, not expanded */ }
_ => {}
}
}
}
Events expose what the semantic tree throws away: scalar quote style, block vs flow collection style, anchors/aliases as distinct events, tags, and document directives. Aliases are not expanded here — that’s what makes events the right tool for preserving or analyzing the original document.
What “bounded memory” means
Streaming bounds the retained parsed representation: DocumentStream keeps one
document live at a time instead of a whole Vec. The source bytes are still
fully buffered — these are synchronous pull APIs over an in-memory input, not
constant-memory async readers.
The memory win is real on multi-document streams (the working set stays flat as the stream grows) and negligible on a single large document, where there’s nothing to reclaim mid-parse. The benchmarks quantify both.
Need it all at once?
parse_documents / parse_events are the all-or-error collectors over the same
parser, returning a Vec. Reach for the streams only when you want to act on
items as they arrive or cap retained documents.
Migrating from serde_yaml
serde_yaml is archived. For config-shaped Serde code, saneyaml is close to a
drop-in: the read API, Value, and the with::singleton_map helpers keep the
same spelling — now with YAML 1.2 scalar resolution and richer diagnostics.
This page is the call-site cookbook. The exhaustive support matrix, divergences, and threat model live in COMPATIBILITY.md; the scalar-typing differences are in Schema modes.
Scope. saneyaml is an adoption candidate for config-shaped Serde reads plus structural writes. It is not a blanket drop-in for every YAML document, every emitter byte, or full YAML 1.1 / libyaml behavior.
Two ways to switch
Keep serde_yaml::… spellings — alias the package in Cargo, change nothing
in source:
[dependencies]
serde_yaml = { package = "saneyaml", version = "0.3.0" }
serde_yaml::from_str, serde_yaml::Value, serde_yaml::with::singleton_map,
and friends keep compiling against this crate.
Or import directly and rewrite the prefix:
[dependencies]
saneyaml = "0.3.0"
#![allow(unused)]
fn main() {
// mechanical rename
let cfg: saneyaml::Value = saneyaml::from_str(input)?;
// …or alias one file at a time
use saneyaml as serde_yaml;
let cfg: serde_yaml::Value = serde_yaml::from_str(input)?;
}
The shipped examples/serde_yaml_migration.rs
compiles the full alias surface end to end.
Cookbook
Each recipe shows the call site. Under a Cargo/source alias, keep the
serde_yaml:: spelling; with a direct import, swap the prefix to saneyaml::.
Typed reads — unchanged:
#![allow(unused)]
fn main() {
let config: Config = saneyaml::from_str(input)?;
let config: Config = saneyaml::from_slice(bytes)?;
let config: Config = saneyaml::from_reader(reader)?;
}
Value indexing and patching — unchanged:
#![allow(unused)]
fn main() {
let mut value: saneyaml::Value = saneyaml::from_str(input)?;
value["services"]["api"]["image"] = saneyaml::Value::from("nginx:latest");
let ports = value["services"]["api"]["ports"].as_sequence();
}
Tagged enums / singleton maps — same helpers:
#![allow(unused)]
fn main() {
#[derive(serde::Serialize, serde::Deserialize)]
struct Job {
#[serde(with = "saneyaml::with::singleton_map")]
action: Action,
}
}
Use singleton_map_recursive for nested enum payloads.
Multi-document streams — iterate, or collect:
#![allow(unused)]
fn main() {
let docs = saneyaml::Deserializer::from_str(stream)
.map(Config::deserialize)
.collect::<Result<Vec<_>, _>>()?;
let docs: Vec<Config> = saneyaml::from_documents_str(stream)?; // additive convenience
}
Structural writes — unchanged:
#![allow(unused)]
fn main() {
let text = saneyaml::to_string(&config)?;
saneyaml::to_writer(&mut writer, &config)?;
}
Errors — line() / column() still work; span(), category(), path(),
and render_source() are additive (Diagnostics):
#![allow(unused)]
fn main() {
let err = saneyaml::from_str::<Config>("name: [").unwrap_err();
if let Some(loc) = err.location() {
eprintln!("{}:{}", loc.line(), loc.column());
}
}
What behaves differently
Five things a migrator should know — most code never touches them:
| Change | What it means for you |
|---|---|
| YAML 1.2 by default | no / on / NO stay strings, not booleans. Opt into the old behavior per call with LoadOptions::legacy_serde_yaml(). See Schema modes. |
| Merge keys expand by default | Loaded Node/Value give you the merged result. serde_yaml::Value kept the literal << until apply_merge(). To see raw <<, use events or the lossless graph. |
Value is spanless | It won’t coerce a number/bool into a String target, and it doesn’t carry comments, anchors, or graph identity. Read with from_str/from_node when source text matters; use the lossless graph for formatting. |
| Structural writer | to_string emits clean deterministic YAML, not byte-identical serde_yaml. Pass EmitOptions::byte_compatible() for the supported byte corpus. |
| Resource limits on by default | Untrusted input is bounded (64 MiB, depth, scalar, collection, alias). Tune or opt out via LoadOptions. See Untrusted input. |
Support matrix
All of the following resolve under both rename paths and are covered by the swap harness and downstream smokes:
serde_yaml surface | Status |
|---|---|
from_str / from_slice / from_reader | Covered for typed config reads and Value |
Deserializer::{from_str, from_slice, from_reader} | Covered, incl. multi-document iteration |
Value / Mapping / Number | Covered: reads, mutation, indexing, helpers, traits |
value::{to_value, Serializer} | Covered for config-shaped serialization |
to_string / to_writer / Serializer | Structural output covered; byte_compatible() matches bytes on the supported corpus |
with::singleton_map / singleton_map_recursive | Covered for read and write |
Error / Result / Location | Covered; richer diagnostics are additive |
The indexing traits (Index, mapping::Index) are sealed, as they were
upstream — use the built-in string / usize / Value lookups.
Proof
The migration claims are executable, not aspirational:
tests/serde_yaml_swap_harness.rs— the same call sites run againstserde_yaml 0.9.34and against this crate under theserde_yamldependency name.tests/downstream_migration_harness.rsandtests/external_downstream_migration.rs— pinned real-world configs and reduced fixtures from realserde_yamlusers (Pingora, rust-i18n, cfn-guard, navi, Stackable).scripts/downstream-build-trials.sh— packages this crate and builds those downstreams with theirserde_yamldependency rewritten to it.
cargo test --test serde_yaml_swap_harness --test downstream_migration_harness
cargo test --test external_downstream_migration
scripts/downstream-build-trials.sh smoke-only
Real-world gates currently cover 33 files / 39 documents across GitHub Actions, Docker Compose, Kubernetes, Helm, OpenAPI, Wrangler, Ansible, CloudFormation/SAM, Symfony, GitLab CI, CircleCI, and Azure Pipelines. They prove the selected corpus — not a substitute for testing your own YAML.
Migration impact ledger
| Area | Migration impact |
|---|---|
| Default merge expansion | Loaded Node/Value and Serde reads expand untagged and explicit merge-tag << by default. Code that inspected merge syntax should switch to parse_events or LosslessStream. Explicit !!str << and custom-tagged << stay literal. |
| YAML 1.1 compatibility | Legacy scalar/merge behavior is opt-in via schema modes; default entrypoints stay YAML 1.2-oriented, so corpora that need 1.1 typing need opt-in tests. |
| Alias graph identity | Semantic trees clone acyclic aliases and reject recursion; graph-sensitive callers should use LosslessStream. |
| Lossless formatting | Comments, anchors, directives, and source style are preserved only by LosslessStream / ConfigEditor, not the semantic Value tree. |
| Parser acceptance differences | Some YAML 1.2 inputs libyaml rejects are accepted, and some malformed libyaml-tolerated inputs are rejected. Per-case detail lives in the divergence records. |
| Package status | Cargo.toml declares saneyaml 0.3.0 under the MIT license. |
Known follow-up
- Keep the named external crate build trials current before broadening ecosystem replacement claims.
- Keep divergence records and migration-impact wording current as behavior changes.
- Treat full YAML compatibility and arbitrary source-preserving emission as future work until they are fixture-backed.
Compatibility
This is the exhaustive compatibility and divergence reference. For everyday use you don’t need it — see Schema modes, Untrusted input, and Migrating from serde_yaml.
What this crate targets
- Primary API:
serde_yamlread-side ergonomics for config-shaped YAML —from_str/from_slice/from_reader, plusValueand structural writes. - Parser reference: YAML 1.2 tree/event acceptance comparable to
yaml-rust2andsaphyrfor supported syntax. - Documented divergence: libyaml / YAML 1.1-era behavior is version-pinned against a Ruby Psych 3.1.0 / libyaml 0.2.1 probe. Default loading is YAML 1.2-oriented; YAML 1.1 typing is opt-in.
Every divergence record under tests/fixtures/divergences/records/ carries a
migration_impact field, and tests/divergence_manifest.rs fails any record
that omits caller-facing impact. That registry is the source of truth for
intentional behavior splits.
serde_yaml 0.9 rename support matrix
“Supported” means the name resolves under both serde_yaml = { package = "saneyaml", ... } and use saneyaml as serde_yaml;. “Intentionally divergent”
means it resolves but behaves differently by policy. “Not preservable” means it
isn’t a stable surface this crate emulates.
serde_yaml 0.9 surface | Status |
|---|---|
from_str, from_slice, from_reader | Supported |
from_value, to_value | Supported |
to_string, to_writer | Supported (byte-identical output is an opt-in tier) |
Deserializer::{from_str,from_slice,from_reader} | Supported, incl. multi-document iteration |
Serializer::{new,flush,into_inner} | Supported |
Value, Sequence, Mapping, Number | Supported |
value::*, mapping::* | Supported |
with::singleton_map, with::singleton_map_recursive | Supported |
| Default tag-style enum input/output | Supported |
Error, Result, Location | Supported (richer diagnostics are additive) |
Value merge-key retention | Intentionally divergent — loaded Value expands << by default; raw events / lossless preserve it |
| Default YAML 1.1 scalar construction | Intentionally divergent — default is YAML 1.2; use LoadOptions schema modes for legacy typing |
Exact Number private representation | Not preservable (public helpers kept; integers widened) |
Downstream impl Index | Not preservable (sealed here, as upstream) |
| Byte-identical libyaml emitter output | Not preservable (writer is structural; bytes covered for the documented corpus) |
Comments / anchors / graph identity in Value | Not preservable (use LosslessStream) |
Reproducible loader matrix
Generated from tests/fixtures/compatibility-matrix/manifest.toml and checked by
tests/compatibility_matrix.rs. Cross-ecosystem entries are pinned offline
vectors; the Rust test validates their metadata and does not execute Go,
Python, or C++ runtimes.
| Behavior family | Proof source | yaml policy | yaml | serde_yaml | yaml-rust2 | saphyr | Cross-ecosystem vector | Divergence / migration impact |
|---|---|---|---|---|---|---|---|---|
| Typed Serde config entrypoints | tests/compatibility_matrix.rs typed AppConfig probe | YAML 1.2 default typed reads preserve common config scalars. | accept | accept | n/a | n/a | n/a | Serde-only Rust API row; parser-only loaders are intentionally marked n/a instead of given adapter shims. |
| Registered real-world fixtures | tests/fixtures/real-world/SOURCE.toml, 33 files / 39 documents | Every registered fixture must parse with the three Rust reference loaders. | accept | accept | accept | accept | n/a | Config migration smoke coverage includes CloudFormation/SAM, Symfony, GitLab CI, CircleCI, Azure Pipelines, GitHub Actions, Docker Compose, Kubernetes, Helm, OpenAPI, Wrangler, and Ansible without compatibility fallbacks. |
| CI expression and script scalars | GitHub Actions, CircleCI, and Azure Pipelines synthetic scalar shapes | Treat CI expressions as plain or quoted strings under the default schema. | accept | accept | accept | accept | go-yaml gopkg.in/yaml.v3 v3.0.1: accept PyYAML 6.0.2: accept yaml-cpp 0.8.0: accept | CI users can migrate expression-heavy config without enabling YAML 1.1 compatibility or expression-specific parsing. |
| Anchors, aliases, and merge keys | GitLab CI-style defaults and merge expansion fixture | Semantic loaders expand acyclic merge keys; raw/lossless surfaces preserve anchor and merge syntax. | accept | accept | accept | accept | go-yaml gopkg.in/yaml.v3 v3.0.1: accept PyYAML 6.0.2: accept yaml-cpp 0.8.0: accept | tests/fixtures/divergences/records/merge-keys.toml; Graph-sensitive callers should use lossless graph APIs; semantic config callers get effective merged mappings. |
| Application custom tags | CloudFormation/SAM and Symfony short-form tags | Retain application tags in this crate’s Value/event/lossless surfaces while allowing common loader acceptance. | accept | accept | accept | accept | n/a | tests/fixtures/divergences/records/custom-tags.toml; Tagged config users should assert tag-retention behavior directly because some reference trees accept syntax while dropping or reshaping tag metadata. |
| Multi-document streams | Kubernetes-style explicit document stream | Explicit stream boundaries are accepted and document counts stay stable. | accept | accept | accept | accept | go-yaml gopkg.in/yaml.v3 v3.0.1: accept PyYAML 6.0.2: accept yaml-cpp 0.8.0: accept | Stream-processing callers should keep asserting document counts when migrating Kubernetes-style manifests. |
Behavior by area
| Area | saneyaml policy | libyaml / YAML 1.1 | yaml-rust2 / saphyr | serde_yaml |
|---|---|---|---|---|
on, off, yes, no | Strings by default; booleans only under explicit YAML 1.1 | Often booleans | Per schema | Data-model dependent |
| Duplicate keys | Rejected after alias expansion (1 and "1" are distinct keys) | Often last-wins | yaml-rust2 rejects some; saphyr accepts X38W | Rejects duplicate scalar keys |
Merge key << | Expanded by default in loaded trees and Serde reads; raw events and LosslessStream keep it literal; Value::apply_merge() is an explicit helper | Expanded, earlier merges win | Preserved literally | Literal in Value until apply_merge() |
| Anchors and aliases | Semantic trees clone acyclic aliases (no graph identity); LosslessStream keeps alias-to-anchor identity | Sometimes graph identity | Clone-on-alias | Accepted in read paths |
| Custom tags | Retained in Value/events/lossless; transparent for typed reads; %TAG handles resolved; undeclared handles rejected | Supported | Supported | Partial/lossy |
| Comments / formatting | Discarded by semantic loaders; retained by LosslessStream for byte-stable replay and edits | Not semantic | Not semantic | Discarded |
| Emission | structural() is deterministic default; byte_compatible() matches serde_yaml bytes for the documented corpus; document-marker policy matches serde_yaml | Manual comparison only | Manual comparison only | Marker policy matched; bytes for the supported corpus |
| Numbers / timestamps / binary | Decimals + underscores + special floats + 0123 (decimal) by default; octal/hex/binary/sexagesimal, !!timestamp, and !!binary under explicit YAML 1.1 or tags | Broad YAML 1.1 typing | Varies by crate | Data-model dependent |
| Directives | %YAML / %TAG accepted as syntax; yaml_version_directive() lets %YAML 1.1 pick legacy construction | May affect schema | Exposed by parser | Usually not a value |
| Explicit core tags | !!int, !!float, !!bool, !!null, !!str, !!timestamp, !!binary preserved and coerced for typed reads (verbatim, canonical URI, or %TAG handle) | Common | Varies | Partial/lossy |
| YAML 1.1 collection/structural tags | !!set, !!omap, !!pairs, !!seq, !!map, !!value retained and mapped to typed targets; malformed payloads rejected with spans | Lossy recovery | Tag info available; contracts differ | Not retained |
Scalar Resolution Modes
Schema::Yaml12 is the retained spelling for Schema::Core (the default);
Schema::Yaml11 is the retained spelling for Schema::LegacySerdeYaml.
Schema::Json resolves only JSON lowercase booleans/null and JSON numbers, then
keeps other scalar text as strings. Schema::Failsafe keeps every scalar a
string. Missing mapping values and empty documents are null in every mode.
| Plain scalar | Core / Yaml12 | Json | Failsafe | LegacySerdeYaml / Yaml11 |
|---|---|---|---|---|
| missing value | null | null | null | null |
~ | null | string | string | null |
null, Null, NULL | null | null only is null; other spellings string | string | null |
true, false | bool | bool | string | bool |
True, TRUE, False, FALSE | bool | string | string | bool |
yes, no, on, off, y, n, NO | string | string | string | bool |
123, +12, 0123, 1_000 | decimal number | JSON number only; +12, 0123, and underscores string | string | number; 0123 is octal |
0x7B, 0b1010, 0o77 | string | string | string | hex and binary numbers; 0o77 string |
1:20, 1:20:30.5 | string | string | string | sexagesimal number |
1.5 | float | JSON float | string | float |
.inf, -.Inf, .NAN | float | string | string | float |
2026-05-24, timestamp datetimes | string | string | string | retained !!timestamp string with Timestamp typed reads |
Tree-shape divergences
A few YAML 1.2 inputs parse fine as events but yield tree-shape divergences in the loaded tree, where reference loaders disagree. saneyaml keeps these in event parity and shared-reference acceptance, excludes them from loaded-tree value-shape parity, and pins a divergence record for each:
- PW8X and 6KGN — anchors on empty scalar nodes.
- S4JQ — an explicit non-specific tag shape on an empty node.
- C4HZ — a custom tag plus a schema scalar divergence.
- FH7J — tags on empty scalar nodes.
tests/parity_manifest.rs gates these terms and the event/tree/shared-reference
ledgers; cargo test --test conformance_dashboard -- --nocapture prints the full
402-case selected-suite dashboard with divergence overlays.
Event and streaming contracts
EventStreamis the stable pull-based parser-event surface;parse_eventsis the all-or-error collector over the same events. Events carry scalar style, block/flow collection style, tags, anchors, alias events, andDocumentStartdirectives. Aliases are not expanded here.DocumentStreamis the semantic pull stream — one merge-expandedNodeper document, same schema/limits/spans asparse_documents.- Reader constructors fully buffer bounded input before yielding, so streaming bounds the retained parsed representation, not the source bytes.
Raw scalar spelling and graph identity are not exposed by events; recover them
with LosslessStream::source_fragment(span) and the lossless graph.
Threat Model and Resource Guarantees
The defended input is untrusted YAML at every load entrypoint. With default
LoadOptions, the crate rejects:
- input above 64 MiB before parsing,
- alias-expansion bombs (input-derived budget) and recursive aliases,
- nesting beyond 128, scalars above 1 MiB, and collections above 16,384 entries,
with span-bearing diagnostics. Raw event and lossless streams validate alias
references but don’t expand them. Callers can tighten or relax each limit through
LoadOptions; a without_*_limit() opt-out transfers that bound to the caller.
These are structural construction limits, not wall-clock or resident-memory
guarantees: reader entrypoints fully buffer bounded input, and your own
Deserialize impls can allocate after the YAML layer hands them bounded values.
saneyaml validates YAML structure, not application schemas. See
Untrusted input for the how-to and SECURITY.md
for reporting.
Public API stability
The crate is pre-1.0 (MSRV Rust 1.88), but the preview surface is SemVer-visible:
public exports, enum variants, struct fields, constants, and the
package-vs-library name split are commitments. docs/PUBLIC_API.txt
is the committed snapshot checked for drift; intentional changes must update it
along with this file and MIGRATION.md.
saneyaml::Error keeps a flat Display compatible with the preview contract and
exposes additive category(), path(), document_index(), and
render_source(...) diagnostics. saneyaml::Index and saneyaml::mapping::Index
are sealed.
Architecture
Package and Library Names
The crates.io package name and the Rust library target are both saneyaml, so
downstream code imports this crate as saneyaml::...:
[dependencies]
saneyaml = "0.3.0"
For drop-in serde_yaml migration, Cargo dependency renaming keeps existing
source imports intact:
[dependencies]
serde_yaml = { package = "saneyaml", version = "0.3.0" }
or a local source alias:
#![allow(unused)]
fn main() {
use saneyaml as serde_yaml;
}
Monolith Decision
saneyaml is one crate for the first public release. It is not a crate family
and does not publish separate yaml-core, yaml-value, yaml-serde, or
yaml-edit packages.
The parser, tree model, deserializer, emitter, and lossless source model are
tightly coupled today. In particular, parse.rs and de.rs share reader
ingestion, limits, schema construction, merge behavior, and span diagnostics.
Splitting those seams before real downstream adoption would create compatibility
and versioning surfaces without reducing meaningful user complexity.
The monolith keeps one SemVer contract, one feature map, one diagnostics model,
and one package alias story for serde_yaml migration.
Feature Facade
The only optional crate feature currently exposed is:
default = ["lossless"]
lossless = []
lossless controls source-backed graph inspection and format-preserving edit
helpers. Serde integration, Value, structural writers, and emitter controls
are always part of the package contract because serde is a runtime dependency
and the migration surface is not currently split into optional subfeatures.
Future feature narrowing should add real cfg(feature = "...") boundaries,
tests, and documentation instead of naming facade features that do not alter
compiled API.
Stability Boundary
The crate is pre-1.0. Public exports, public enum variants, public struct
fields, feature names, package metadata, MSRV, and the package-vs-library name
split are still SemVer-visible for adopter trust. Intentional changes to those
surfaces must update docs/PUBLIC_API.txt, docs/MIGRATION.md,
docs/COMPATIBILITY.md, and the baseline evidence rather than relying on
silent drift.
Real-World Config Benchmarks And Large Inputs
The benchmark examples parse checked-in or generated YAML without timing file I/O. They report aggregate cost so small files do not dominate the signal. These benchmark and conformance commands are source-checkout-only: the published crate package ships this document, but it intentionally excludes the dev-dependency examples and fixture corpora used to regenerate the tables.
cargo run --locked --release --example real_world_benchmark
YAML_BENCH_ITERS=1000 cargo run --locked --release --example real_world_benchmark
cargo run --locked --release --example large_input_benchmark
YAML_LARGE_BENCH_ITERS=20 cargo run --locked --release --example large_input_benchmark
Environment for the latest captured run:
- Reference crates:
serde-saphyr 0.0.27withdeserializeonly,yaml-rust2 0.11.0,saphyr 0.0.6 - Small fixture set: 33 files / 39 YAML documents / 25,362 bytes
- Large fixture set: pinned downstream fixtures plus generated 1 MiB inputs
- Captured: 2026-06-06 with Cargo’s
releaseprofile and--locked
The linked serde-saphyr repository was ahead of crates.io at the time of this
capture (0.0.28 in Git, latest published 0.0.27). The benchmark pins the
published crate so the checked-in Cargo.lock and package checks remain
registry-reproducible.
The serde-saphyr rows use benchmark options rather than the crate defaults:
strict_booleans: true plus relaxed event, alias, document, node, scalar, and
merge budgets so the generated corpora are comparable throughput inputs. Because
serde-saphyr does not expose a native YAML value tree, the matched generic
Serde lane deserializes both libraries into serde_yaml::Value. The preflight
normalizes two public-contract differences before asserting equality:
serde-saphyr::from_multiple_with_options skips empty/null-like documents, and
serde-saphyr treats YAML tags as transparent for this target while saneyaml
preserves them.
The README overview graphic is a static summary of selected benchmark and
feature rows. Its source notes and update checklist live at
docs/assets/saneyaml-overview.md; update that
note with this file whenever the graphic changes.
The large benchmark’s peak retained bytes and peak retained heap objects
columns are safe retained-output estimates from parsed tree container and
string capacities after a single parse. They are not allocator instrumentation
and do not include transient parser scratch. For multi-fixture corpora, they
report the peak retained output for one fixture because each fixture is parsed
and dropped independently.
The 2026-06-01 zero-copy line slice removed transient per-line raw/content text allocations from this crate’s parser by storing one resident source buffer and per-line byte ranges. That allocation drop is visible in the parser code path rather than in the retained-output columns, because preprocessed lines are dropped before the parsed tree is returned.
The 2026-06-01 no-merge fast path records whether each parsed document contains
a semantic merge key and skips the post-parse merge traversal when none was
seen. In the same-session target capture, generated_multi_doc_stream_1mib
saneyaml::parse_documents moved from 25.87 to 23.98 ns/byte while saphyr moved
from 24.86 to 24.89 ns/byte. Retained output estimates are unchanged because
the returned tree shape is unchanged; the removed work is transient
per-document merge scanning and its scratch stack.
The 2026-06-01 plain-scalar continuation slice delays String allocation until
a plain scalar is proven to span multiple lines. In the next same-session target
capture, generated_multi_doc_stream_1mib saneyaml::parse_documents moved from
23.98 to 21.87 ns/byte while saphyr measured 24.42 ns/byte. Retained output
estimates are unchanged because single-line scalar output is identical; the
removed work is transient short String allocation before the scalar falls back
to the inline parse path.
The 2026-06-01 retained-capacity slice trims completed document, sequence, and
mapping vectors before returning parsed trees. In the large-input capture below,
saneyaml::parse_documents retained bytes moved from 703,340 to 486,188 on the
Stackable peak, from 23,031,972 to 13,006,500 on the generated multi-document
stream, and from 13,040,211 to 9,893,619 on the generated 1 MiB wide mapping.
The same-run speed lead over saphyr remains intact for the default spanful
parser on every large-input row.
The same milestone adds saneyaml::parse_borrowed_documents, an explicit
spanless retained tree that can borrow scalar strings from the caller’s input
buffer. This is an additive load path, not a silent change to
saneyaml::parse_documents; the retained-output estimate counts the borrowed tree
heap and, like the saphyr row, does not count the caller-owned source buffer.
That row closes the retained-memory axis against saphyr 0.0.6 across the
large-input corpus while preserving the owning parser’s spans and scalar-source
behavior.
Real-World Config Corpus
Latest same-run capture after adding the matched serde-saphyr lane, using
YAML_BENCH_ITERS=1000:
| parser/load path | iterations | bytes per iteration | docs per iteration | elapsed ms | ns/byte |
|---|---|---|---|---|---|
saneyaml::parse_documents | 1,000 | 25,362 | 39 | 431.314 | 17.01 |
saneyaml::from_documents_str::<Value> | 1,000 | 25,362 | 39 | 572.477 | 22.57 |
saneyaml::from_documents_str::<serde_yaml::Value> | 1,000 | 25,362 | 39 | 578.399 | 22.81 |
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value> | 1,000 | 25,362 | 39 | 1,165.027 | 45.94 |
serde_saphyr::from_multiple_with_options::<serde_yaml::Value> | 1,000 | 25,362 | 39 | 1,055.332 | 41.61 |
serde_yaml::Value stream | 1,000 | 25,362 | 39 | 644.638 | 25.42 |
yaml_rust2::YamlLoader | 1,000 | 25,362 | 39 | 539.894 | 21.29 |
saphyr::Yaml::load_from_str | 1,000 | 25,362 | 39 | 502.288 | 19.80 |
On this corpus, the matched generic Serde value lane measured saneyaml at 22.81 ns/byte versus serde-saphyr at 41.61 ns/byte. The private event-backed prototype measured 45.94 ns/byte on the same target, so it is not a replacement for the tree-backed Serde path yet. The raw tree-load rows are shown for context but are a different contract from serde-saphyr’s Serde-only API.
Same-turn pre-optimization baseline, captured before this milestone with the default 200 iterations:
| parser/load path | iterations | bytes per iteration | docs per iteration | elapsed ms | ns/byte |
|---|---|---|---|---|---|
saneyaml::parse_documents | 200 | 19,727 | 33 | 126.377 | 32.03 |
saneyaml::from_documents_str::<Value> | 200 | 19,727 | 33 | 141.211 | 35.79 |
serde_yaml::Value stream | 200 | 19,727 | 33 | 153.465 | 38.90 |
yaml_rust2::YamlLoader | 200 | 19,727 | 33 | 100.959 | 25.59 |
saphyr::Yaml::load_from_str | 200 | 19,727 | 33 | 92.393 | 23.42 |
Post zero-copy line-slice re-capture with 1,000 iterations (independent run, 2026-06-01):
| parser/load path | iterations | bytes per iteration | docs per iteration | elapsed ms | ns/byte |
|---|---|---|---|---|---|
saneyaml::parse_documents | 1,000 | 19,727 | 33 | 355.271 | 18.01 |
saneyaml::from_documents_str::<Value> | 1,000 | 19,727 | 33 | 422.618 | 21.42 |
serde_yaml::Value stream | 1,000 | 19,727 | 33 | 523.390 | 26.53 |
yaml_rust2::YamlLoader | 1,000 | 19,727 | 33 | 434.222 | 22.01 |
saphyr::Yaml::load_from_str | 1,000 | 19,727 | 33 | 402.909 | 20.42 |
Result: after the zero-copy line slice, saneyaml::parse_documents was faster than
the pinned reference loaders on this small corpus in that 2026-06-01
1,000-iteration same-run capture (18.01 ns/byte vs saphyr at 20.42 and yaml_rust2 at
22.01). The owning Value path also remains ahead of the serde_yaml Value
stream and roughly ties yaml_rust2 on this corpus.
Post no-merge and plain-scalar fast-path re-capture with the default 200 iterations (independent run, 2026-06-01):
| parser/load path | iterations | bytes per iteration | docs per iteration | elapsed ms | ns/byte |
|---|---|---|---|---|---|
saneyaml::parse_documents | 200 | 19,727 | 33 | 72.598 | 18.40 |
saneyaml::from_documents_str::<Value> | 200 | 19,727 | 33 | 84.585 | 21.44 |
serde_yaml::Value stream | 200 | 19,727 | 33 | 116.031 | 29.41 |
yaml_rust2::YamlLoader | 200 | 19,727 | 33 | 82.792 | 20.98 |
saphyr::Yaml::load_from_str | 200 | 19,727 | 33 | 78.606 | 19.92 |
Methodology caveat: the pre-optimization table above was captured at 200 iterations and the post-optimization table at 1,000, so part of the across-table ns/byte drop reflects warm-up rather than optimization. The trustworthy signal is the same-run cross-loader comparison within each table, plus the larger, lower-noise inputs below — not the across-table delta.
Large Inputs
Command:
cargo run --locked --release --example large_input_benchmark
Default iterations: 20, controlled by YAML_LARGE_BENCH_ITERS.
external_downstream_all
20 pinned downstream files / 245,062 bytes / 20 YAML documents.
| parser/load path | iterations | bytes per iteration | docs per iteration | elapsed ms | ns/byte | peak retained bytes | peak retained heap objects |
|---|---|---|---|---|---|---|---|
saneyaml::parse_documents | 20 | 245,062 | 20 | 31.887 | 6.51 | 486,188 | 3,983 |
saneyaml::parse_borrowed_documents | 20 | 245,062 | 20 | 30.256 | 6.17 | 173,556 | 904 |
saneyaml::from_documents_str::<Value> | 20 | 245,062 | 20 | 40.035 | 8.17 | 217,483 | 3,780 |
saneyaml::from_documents_str::<serde_yaml::Value> | 20 | 245,062 | 20 | 41.333 | 8.43 | 378,843 | 3,780 |
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value> | 20 | 245,062 | 20 | 77.032 | 15.72 | 396,987 | 3,780 |
serde_saphyr::from_multiple_with_options::<serde_yaml::Value> | 20 | 245,062 | 20 | 67.763 | 13.83 | 396,987 | 3,780 |
serde_yaml::Value stream | 20 | 245,062 | 20 | 51.019 | 10.41 | 396,987 | 3,780 |
yaml_rust2::YamlLoader | 20 | 245,062 | 20 | 38.470 | 7.85 | 382,497 | 3,796 |
saphyr::Yaml::load_from_str | 20 | 245,062 | 20 | 36.419 | 7.43 | 534,786 | 3,780 |
stackable_dummy_cluster
One pinned Stackable CRD / 177,556 bytes / 1 YAML document.
| parser/load path | iterations | bytes per iteration | docs per iteration | elapsed ms | ns/byte | peak retained bytes | peak retained heap objects |
|---|---|---|---|---|---|---|---|
saneyaml::parse_documents | 20 | 177,556 | 1 | 20.570 | 5.79 | 486,188 | 3,983 |
saneyaml::parse_borrowed_documents | 20 | 177,556 | 1 | 19.159 | 5.40 | 173,556 | 904 |
saneyaml::from_documents_str::<Value> | 20 | 177,556 | 1 | 24.524 | 6.91 | 217,483 | 3,780 |
saneyaml::from_documents_str::<serde_yaml::Value> | 20 | 177,556 | 1 | 25.203 | 7.10 | 378,843 | 3,780 |
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value> | 20 | 177,556 | 1 | 47.657 | 13.42 | 396,987 | 3,780 |
serde_saphyr::from_multiple_with_options::<serde_yaml::Value> | 20 | 177,556 | 1 | 41.445 | 11.67 | 396,987 | 3,780 |
serde_yaml::Value stream | 20 | 177,556 | 1 | 32.343 | 9.11 | 396,987 | 3,780 |
yaml_rust2::YamlLoader | 20 | 177,556 | 1 | 24.245 | 6.83 | 382,497 | 3,796 |
saphyr::Yaml::load_from_str | 20 | 177,556 | 1 | 23.095 | 6.50 | 534,786 | 3,780 |
generated_multi_doc_stream_1mib
Generated multi-document service stream / 1,048,680 bytes / 8,020 YAML documents.
| parser/load path | iterations | bytes per iteration | docs per iteration | elapsed ms | ns/byte | peak retained bytes | peak retained heap objects |
|---|---|---|---|---|---|---|---|
saneyaml::parse_documents | 20 | 1,048,680 | 8,020 | 367.321 | 17.51 | 13,006,500 | 128,321 |
saneyaml::parse_borrowed_documents | 20 | 1,048,680 | 8,020 | 361.229 | 17.22 | 4,106,240 | 32,081 |
saneyaml::from_documents_str::<Value> | 20 | 1,048,680 | 8,020 | 495.592 | 23.63 | 4,729,860 | 112,281 |
saneyaml::from_documents_str::<serde_yaml::Value> | 20 | 1,048,680 | 8,020 | 538.155 | 25.66 | 9,862,660 | 112,281 |
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value> | 20 | 1,048,680 | 8,020 | 1,116.471 | 53.23 | 11,607,364 | 112,281 |
serde_saphyr::from_multiple_with_options::<serde_yaml::Value> | 20 | 1,048,680 | 8,020 | 1,223.597 | 58.34 | 11,607,364 | 112,281 |
serde_yaml::Value stream | 20 | 1,048,680 | 8,020 | 661.464 | 31.54 | 11,607,364 | 112,281 |
yaml_rust2::YamlLoader | 20 | 1,048,680 | 8,020 | 545.253 | 26.00 | 10,386,948 | 112,281 |
saphyr::Yaml::load_from_str | 20 | 1,048,680 | 8,020 | 534.700 | 25.49 | 14,770,560 | 112,281 |
generated_wide_mapping_256kib
Generated one-document wide service mapping / 262,176 bytes.
| parser/load path | iterations | bytes per iteration | docs per iteration | elapsed ms | ns/byte | peak retained bytes | peak retained heap objects |
|---|---|---|---|---|---|---|---|
saneyaml::parse_documents | 20 | 262,176 | 1 | 78.226 | 14.92 | 2,484,775 | 23,932 |
saneyaml::parse_borrowed_documents | 20 | 262,176 | 1 | 73.898 | 14.09 | 765,792 | 2,994 |
saneyaml::from_documents_str::<Value> | 20 | 262,176 | 1 | 107.359 | 20.47 | 938,236 | 17,950 |
saneyaml::from_documents_str::<serde_yaml::Value> | 20 | 262,176 | 1 | 106.085 | 20.23 | 1,895,476 | 17,950 |
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value> | 20 | 262,176 | 1 | 223.815 | 42.68 | 1,895,692 | 17,950 |
serde_saphyr::from_multiple_with_options::<serde_yaml::Value> | 20 | 262,176 | 1 | 212.831 | 40.59 | 1,895,692 | 17,950 |
serde_yaml::Value stream | 20 | 262,176 | 1 | 114.758 | 21.89 | 1,895,692 | 17,950 |
yaml_rust2::YamlLoader | 20 | 262,176 | 1 | 102.762 | 19.60 | 1,704,220 | 17,950 |
saphyr::Yaml::load_from_str | 20 | 262,176 | 1 | 97.245 | 18.55 | 2,393,312 | 17,950 |
generated_wide_mapping_1mib
Generated one-document wide service mapping / 1,048,661 bytes.
| parser/load path | iterations | bytes per iteration | docs per iteration | elapsed ms | ns/byte | peak retained bytes | peak retained heap objects |
|---|---|---|---|---|---|---|---|
saneyaml::parse_documents | 20 | 1,048,661 | 1 | 309.172 | 14.74 | 9,893,619 | 95,236 |
saneyaml::parse_borrowed_documents | 20 | 1,048,661 | 1 | 282.013 | 13.45 | 3,047,520 | 11,907 |
saneyaml::from_documents_str::<Value> | 20 | 1,048,661 | 1 | 419.435 | 20.00 | 3,739,059 | 71,428 |
saneyaml::from_documents_str::<serde_yaml::Value> | 20 | 1,048,661 | 1 | 417.003 | 19.88 | 7,548,459 | 71,428 |
saneyaml::__unstable_event_serde::from_documents_str::<serde_yaml::Value> | 20 | 1,048,661 | 1 | 878.512 | 41.89 | 7,548,675 | 71,428 |
serde_saphyr::from_multiple_with_options::<serde_yaml::Value> | 20 | 1,048,661 | 1 | 843.638 | 40.22 | 7,548,675 | 71,428 |
serde_yaml::Value stream | 20 | 1,048,661 | 1 | 477.817 | 22.78 | 7,548,675 | 71,428 |
yaml_rust2::YamlLoader | 20 | 1,048,661 | 1 | 420.142 | 20.03 | 6,786,771 | 71,428 |
saphyr::Yaml::load_from_str | 20 | 1,048,661 | 1 | 401.299 | 19.13 | 9,523,712 | 71,428 |
Large-input story: after zero-copy line storage, the no-merge fast path,
delayed plain-scalar continuation allocation, and retained vector
right-sizing, saneyaml::parse_documents beats yaml_rust2 and saphyr on every
large parser path in the latest capture on an unloaded machine. In the matched
Serde value lane, saneyaml::from_documents_str::<serde_yaml::Value> is faster
than serde_saphyr::from_multiple_with_options::<serde_yaml::Value> on every
large-input row. The hidden event-backed Serde prototype only wins against
serde-saphyr on the generated multi-document stream in this capture and remains
slower on the other large rows; it also retains the same serde_yaml::Value
output shape, so it does not improve retained output memory yet. The smallest
corpus (external_downstream_all) is the most contention-sensitive, so its
ordering is the first to wobble under load; the larger corpora hold a clearer
margin. The retained-memory story is now split
by output contract: the default spanful tree keeps spans and scalar-source
spellings and is faster than saphyr, while the additive
saneyaml::parse_borrowed_documents tree drops spans/source spellings and borrows
sliceable scalars, retaining less heap than saphyr on every large-input row
(for example, 3,047,520 vs 9,523,712 bytes on the 1 MiB wide mapping).
Streaming And Compact Line-Table Milestone
This milestone adds a compact per-line table, a fused line-preprocessing scan,
source-backed borrowed scalars, and a lazy streaming line buffer that reclaims
consumed lines as DocumentStream/EventStream advance. Batch loaders keep an
eager line table for speed; only the streaming entrypoints reclaim. The input
string itself stays fully resident, so these are bounded-retention streaming
paths, not constant-memory readers.
Captured in a single release session against the in-repo harnesses:
YAML_BENCH_ITERS=1000 cargo run --locked --release --example real_world_benchmark
cargo run --locked --release --example dhat_memory -- --all
cargo run --locked --release --example conformance_compare
Real-world config corpus (1,000 iterations)
33 files / 39 YAML documents / 25,362 bytes. This is a distinct, later capture
from a separate same-session run of this milestone, not the same measurement as
the “Real-World Config Corpus” table above; the corpus is identical but the
per-loader ns/byte figures differ run to run (for example saphyr reads 21.42
here versus 19.80 there), which is the run-to-run noise the methodology caveat
describes.
| parser/load path | ns/byte |
|---|---|
saneyaml::parse_documents | 15.03 |
saneyaml::from_documents_str::<Value> | 21.19 |
saphyr::Yaml::load_from_str | 21.42 |
yaml_rust2::YamlLoader | 23.11 |
serde_yaml::Value stream | 24.98 |
On this corpus saneyaml::parse_documents is the fastest load path; the owning
Value path ties saphyr and stays ahead of yaml_rust2 and serde_yaml.
Allocator-backed memory (dhat), 1 MiB multi-document stream
8,020 documents. retained blocks is the count of live allocations still held
while the parsed output is retained.
| path | allocs | bytes allocated | peak | retained blocks |
|---|---|---|---|---|
saneyaml stream docs | 184,466 | 13.64 MB | 2.10 MB | 4 |
saneyaml stream events | 232,594 | 49.28 MB | 2.11 MB | 6 |
saneyaml borrowed | 80,219 | 17.29 MB | 6.21 MB | 32,081 |
saneyaml owned | 200,519 | 16.05 MB | 15.12 MB | 128,321 |
saneyaml Value | 449,140 | 25.07 MB | 15.12 MB | 112,281 |
yaml-rust2 | 585,478 | 29.29 MB | 17.15 MB | 192,481 |
saneyaml as serde_yaml::Value | 465,180 | 39.73 MB | 20.79 MB | 136,341 |
saneyaml event-backed as serde_yaml::Value | 1,114,806 | 175.38 MB | 22.83 MB | 136,341 |
serde-saphyr as serde_yaml::Value | 577,465 | 59.71 MB | 21.79 MB | 136,344 |
serde_yaml | 721,821 | 84.73 MB | 21.84 MB | 136,341 |
saphyr | 216,559 | 22.77 MB | 22.30 MB | 192,481 |
On a multi-document stream the streaming loaders hold a bounded working set (retained blocks stay at 4–6 regardless of stream length) and post the lowest peak; the borrowed batch tree has the lowest peak among the non-streaming loaders. The event-backed Serde prototype is allocation-heavy here because it still consumes parser-recorded event frames rather than a direct parser-to-Serde stream.
Allocator-backed memory (dhat), 1 MiB wide single document
| path | peak | retained blocks |
|---|---|---|
serde-saphyr as serde_yaml::Value | 10.73 MB | 83,337 |
yaml-rust2 | 10.98 MB | 130,951 |
saphyr | 14.10 MB | 130,951 |
saneyaml borrowed | 15.32 MB | 11,907 |
saneyaml owned | 16.16 MB | 95,236 |
saneyaml stream docs | 16.16 MB | 4 |
saneyaml Value | 16.39 MB | 71,428 |
saneyaml as serde_yaml::Value | 19.91 MB | 83,334 |
serde_yaml | 23.42 MB | 83,334 |
saneyaml stream events | 62.22 MB | 6 |
saneyaml event-backed as serde_yaml::Value | 78.54 MB | 83,334 |
Streaming only helps when there are document boundaries to reclaim at. On a
single wide document there is nothing to reclaim mid-parse, so yaml-rust2 and
saphyr post lower peaks than saneyaml on this shape, and the event-streaming
path is expensive here because it buffers per-event output for one large
document. The matched serde-saphyr value row posts a low wide-document peak,
while the event-backed Serde prototype is the highest peak in this capture.
Streaming is a multi-document memory win, not a universal one.
Conformance (402 curated cases)
| library | spec accept/reject (400) | tree policy (2) |
|---|---|---|
saneyaml | 400/400 | 2/2 |
yaml-rust2 | 400/400 | 2/2 |
saphyr | 400/400 | 0/2 |
serde_yaml | 333/400 | 2/2 |
saneyaml ties yaml-rust2 and saphyr at 400/400 on the neutral spec set; it
is not a sole leader there. Its differentiation is the combination of full spec
conformance with tree-policy rejection of the duplicate-key/tree-error cases
that saphyr accepts, while serde_yaml trails the spec set at 333/400.
Reproduction & Tooling
Every number in this document comes from an in-repo example, run under Cargo’s
release profile. The commands below regenerate each captured table from a
source checkout of this repository; absolute values vary by machine, but the
same-run cross-loader ordering is the trustworthy signal on an otherwise-idle
machine. The harness is a hand-rolled Instant::now() loop with no warm-up or
statistics, so under heavy machine load even that ordering can invert; treat any
single capture as indicative rather than authoritative.
| captured section | checkout-only command |
|---|---|
| Real-World Config Corpus | cargo run --locked --release --example real_world_benchmark |
| Real-world corpus (1,000 iterations) | YAML_BENCH_ITERS=1000 cargo run --locked --release --example real_world_benchmark |
| Large Inputs (all corpora) | cargo run --locked --release --example large_input_benchmark |
| Large Inputs (custom iteration count) | YAML_LARGE_BENCH_ITERS=20 cargo run --locked --release --example large_input_benchmark |
| Allocator-backed memory (dhat) | cargo run --locked --release --example dhat_memory -- --all |
| dhat single (library, corpus) pair | cargo run --locked --release --example dhat_memory -- saneyaml-borrowed multidoc |
| Conformance (402 curated cases) | cargo run --locked --release --example conformance_compare |
Iteration counts default to 200 for real_world_benchmark (YAML_BENCH_ITERS)
and 20 for large_input_benchmark (YAML_LARGE_BENCH_ITERS). The
dhat_memory example installs a global allocator and must measure one library
per process; -- --all sweeps every (library, corpus) pair for you, and
-- <library> <corpus> profiles a single pair.
Reference-crate versions
The captured comparison numbers were produced against these pinned
dev-dependency versions (see Cargo.toml):
| crate | version |
|---|---|
serde-saphyr | 0.0.27 (default-features = false, deserialize) |
serde_yaml | 0.9.34 |
saphyr | 0.0.6 |
saphyr-parser | 0.0.6 |
yaml-rust2 | 0.11.0 |
dhat | 0.3.3 |
To reproduce against the exact pinned set, build with the checked-in
Cargo.lock (the default for cargo run). Bumping any reference crate can
shift its numbers, so re-capture the whole comparison table when upgrading.