Overview
YAML is a human-friendly data serialization language designed to be more readable than JSON and less verbose than XML for configuration files, data exchange, and structured document authoring. Its design philosophy prioritizes readability: YAML uses indentation to denote structure (similar to Python), supports comments (beginning with #), and allows complex data types like multi-line strings, anchors for deduplication, and custom tags for type annotation — all without requiring brackets, braces, or quotation marks for most values.
YAML is a strict superset of JSON, meaning that any valid JSON document is also a valid YAML document. However, YAML's native syntax goes far beyond JSON's capabilities: block-style mappings and sequences use indentation instead of braces and brackets, scalars can be unquoted or use literal block (|) and folded block (>) indicators for multi-line strings, and anchors (&) and aliases (*) enable DRY (Don't Repeat Yourself) references within a document. These features make YAML the preferred format for configuration files that humans read and edit frequently.
YAML has become the dominant configuration language in the DevOps and cloud-native ecosystem. Docker Compose, Kubernetes, Ansible, GitHub Actions, GitLab CI, CircleCI, Azure Pipelines, Helm charts, and Swagger/OpenAPI specifications all use YAML as their primary configuration format. Its readability advantage over JSON is most apparent in deeply nested structures and lists, where YAML's indentation-based syntax eliminates the visual noise of punctuation.
History
YAML was first proposed in 2001 by Clark Evans, with Ingy dot Net and Oren Ben-Kiki joining the specification effort. The original recursive acronym was 'Yet Another Markup Language,' but it was quickly changed to 'YAML Ain't Markup Language' to emphasize that the format is for data serialization, not document markup. YAML 1.0 was released in January 2004, YAML 1.1 followed in 2005, and YAML 1.2 (the current version) was published in October 2009.
YAML 1.2 made a significant change by aligning its JSON compatibility — previous versions had subtle incompatibilities with JSON's string quoting and number representation. The specification also clarified the type resolution system, making it explicit that unquoted strings like 'yes', 'no', 'on', 'off' are booleans only when the schema says so, addressing a notorious source of bugs (the Norway problem, where the country code 'NO' was interpreted as a boolean false).
Technical Details
A YAML document begins with an optional directive line (e.g., %YAML 1.2) and a document start marker (---). Multiple documents can exist in a single file, separated by --- markers, with ... indicating document end. The data model consists of three node kinds: scalars (strings, numbers, booleans, null, timestamps), sequences (ordered lists), and mappings (unordered key-value pairs). Block style uses newlines and indentation; flow style uses JSON-like brackets and braces for compact inline notation.
YAML's type system relies on tags. Core schema tags include !!str, !!int, !!float, !!bool, !!null, !!seq, and !!map. Without explicit tags, YAML parsers apply implicit type resolution: unquoted 42 becomes an integer, 3.14 a float, true a boolean, and ~ or null a null value. Anchors (&name) mark a node for reuse, and aliases (*name) reference it elsewhere, enabling data deduplication. Merge keys (<<) allow mapping inheritance. Multi-line scalars use literal style (| preserves newlines) or folded style (> converts newlines to spaces). Indentation must use spaces (tabs are forbidden) and is significant for structure.
Pros & Cons
Pros
- Highly readable, clean syntax with indentation-based structure and comment support
- JSON superset — any valid JSON is valid YAML
- Multi-line string support with literal and folded block indicators
- Anchor/alias mechanism enables DRY references and data deduplication
- De facto standard for DevOps tooling (Kubernetes, Docker Compose, CI/CD pipelines)
Cons
- Indentation sensitivity means invisible whitespace errors cause parsing failures
- Implicit type coercion can cause subtle bugs (the 'Norway problem' with boolean values)
- More complex specification than JSON, leading to parser implementation inconsistencies
- Security risks from arbitrary object deserialization in some language bindings
- Slower parsing than JSON due to the more complex grammar
Common Use Cases
- Defining Kubernetes manifests, Helm charts, and container orchestration configurations
- Writing CI/CD pipeline definitions for GitHub Actions, GitLab CI, and Azure Pipelines
- Configuring Docker Compose services and multi-container application stacks
- Authoring OpenAPI/Swagger API specifications with nested schema definitions
- Managing Ansible playbooks and infrastructure automation runbooks
- Storing application configuration that developers read and edit frequently