Reading and Writing YAML in Python
YAML is a human-friendly data serialization format commonly used for configuration files. It stands for “YAML Ain’t Markup Language” and is widely adopted in tools like Docker, Kubernetes, Ansible, and many Python projects. Python’s PyYAML library makes it easy to read and write YAML files.
Installing PyYAML
Before reading or writing YAML, install the PyYAML package:
pip install pyyaml
Once installed, import it in your Python code:
import yaml
Reading YAML Files
The yaml.safe_load() function reads a YAML file and returns a Python object. It handles dictionaries, lists, strings, numbers, and other basic types automatically.
Basic Reading
# config.yaml
# database:
# host: localhost
# port: 5432
# name: myapp
with open('config.yaml') as f:
config = yaml.safe_load(f)
print(config['database']['host']) # localhost
print(config['database']['port']) # 5432
The safe_load() function only loads basic Python objects. This prevents code execution attacks from untrusted YAML files. Never use yaml.load() with untrusted input.
Reading from a String
If your YAML data is in a string variable, use yaml.safe_load() with the string directly:
yaml_data = """
app:
debug: true
max_connections: 100
"""
config = yaml.safe_load(yaml_data)
print(config['app']['debug']) # True
Writing YAML Files
The yaml.dump() function writes a Python object to a YAML file. It converts dictionaries, lists, and other Python types into YAML syntax.
Basic Writing
config = {
'database': {
'host': 'localhost',
'port': 5432,
'name': 'myapp'
},
'logging': {
'level': 'INFO',
'file': 'app.log'
}
}
with open('config.yaml', 'w') as f:
yaml.dump(config, f, default_flow_style=False)
The default_flow_style=False produces cleaner, more readable YAML with proper indentation. Without it, PyYAML writes in a compact single-line format.
Output to a String
To get YAML as a string instead of writing to a file:
yaml_string = yaml.dump(config, default_flow_style=False)
print(yaml_string)
This outputs:
database:
host: localhost
name: myapp
port: 5432
logging:
file: app.log
level: INFO
Working with Nested Structures
YAML excels at representing nested data. Here are common patterns.
Lists of Dictionaries
users = [
{'name': 'Alice', 'role': 'admin', 'active': True},
{'name': 'Bob', 'role': 'developer', 'active': False},
{'name': 'Charlie', 'role': 'developer', 'active': True}
]
with open('users.yaml', 'w') as f:
yaml.dump(users, f, default_flow_style=False)
This produces:
- active: true
name: Alice
role: admin
- active: false
name: Bob
role: developer
- active: true
name: Charlie
role: developer
Mixed Nested Structures
deployment = {
'service': 'web-app',
'replicas': 3,
'ports': [8080, 8081, 8082],
'environment': {
'DEBUG': 'false',
'DATABASE_URL': 'postgresql://localhost/db'
},
'resources': {
'limits': {'cpu': '500m', 'memory': '256Mi'},
'requests': {'cpu': '200m', 'memory': '128Mi'}
}
}
with open('deployment.yaml', 'w') as f:
yaml.dump(deployment, f, default_flow_style=False)
Safe vs Unsafe Loading
Always use yaml.safe_load() for untrusted input. The alternative yaml.load() can execute arbitrary Python code embedded in the YAML.
# UNSAFE - never use with untrusted input
# data = yaml.load(untrusted_yaml) # DANGER!
# SAFE - use this instead
data = yaml.safe_load(untrusted_yaml)
For your own configuration files that you control, yaml.safe_load() handles all standard use cases. You only need the unsafe loader if you’re parsing YAML that intentionally contains Python objects.
Loading with Custom Classes
Sometimes you need to load YAML into custom Python classes. Use the yaml.safe_load() with a custom loader:
class Config:
def __init__(self, **kwargs):
for key, value in kwargs.items():
setattr(self, key, value)
# Define how to construct your class
def construct_config(loader, node):
return Config(**loader.construct_mapping(node))
# Register the constructor
yaml.add_constructor('!config', construct_config, Loader=yaml.SafeLoader)
# Now load YAML with your custom class
with open('config.yaml') as f:
config = yaml.safe_load(f)
This advanced technique lets you preserve type information when reading YAML into Python objects.
Pretty Printing
For better-looking output, use the sort_keys and indent parameters:
data = {'z': 1, 'a': 2, 'm': 3}
print(yaml.dump(data, sort_keys=False, indent=2))
Output:
z: 1
a: 2
m: 3
When to Use YAML
YAML works well for:
- Configuration files (Docker, Kubernetes, Ansible)
- Data exchange between services
- Human-readable data storage
- Complex nested data structures
Avoid YAML when:
- You need strict type validation (use JSON Schema or Pydantic)
- Machine-to-machine communication where parsing speed matters (use JSON or MessagePack)
- You need to embed executable code (use a different format)
Common Pitfalls
Watch for these issues when working with YAML:
Tabs vs Spaces: YAML requires spaces for indentation. Never use tabs.
# Wrong - will cause parsing error
# Use spaces in your YAML files, not tabs
Booleans: YAML recognizes multiple representations of true and false:
# All these are valid boolean values
is_active: true
is_enabled: yes
is_visible: on
is_debug: 1
Multiline Strings: Use | for preserved newlines and > for folded text:
description: |
This preserves
newlines exactly.
summary: >
This folds newlines
into spaces.
See Also
- The official PyYAML documentation at https://pyyaml.org/
- YAML 1.2 specification at https://yaml.org/spec/1.2.2/