Fast Serialisation with msgspec
msgspec is a fast, correct serialization library for Python. It combines the performance of binary formats like MessagePack with the schema validation of tools like Pydantic, but with a fraction of the overhead.
Why msgspec?
If you work with JSON, MessagePack, or other serialization formats in Python, you have likely encountered performance bottlenecks or runtime validation errors. msgspec addresses both:
- Zero-copy decoding: Messages are decoded directly into Python objects without intermediate representations
- Schema validation: Define expected structures once and get automatic validation
- Native binary support: Use MessagePack for compact, fast serialization
- Minimal dependencies: Single package with no required runtime dependencies
Installation
pip install msgspec
Basic Usage
Structs
Define your data structures using msgspec.Struct:
import msgspec
class User(msgspec.Struct):
name: str
email: str
age: int
# Create an instance
user = User(name="Alice", email="alice@example.com", age=30)
# Encode to JSON
json_data = msgspec.json.encode(user)
# b'{"name":"Alice","email":"alice@example.com","age":30}'
# Decode from JSON
decoded = msgspec.json.decode(json_data, type=User)
# User(name='Alice', email='alice@example.com', age=30)
MessagePack
For more compact storage, use MessagePack:
import msgspec
# Encode to MessagePack
msgpack_data = msgspec.msgpack.encode(user)
# b'\x83\xa4name\xa5Alice\xa5email\xbbalice@example.com\xa3age\x1e'
# Decode from MessagePack
decoded = msgspec.msgpack.decode(msgpack_data, type=User)
MessagePack typically produces smaller payloads than JSON—useful for network transmission or storage-constrained environments.
Schema Validation
One of msgspec’s strongest features is built-in validation at decode time:
import msgspec
class User(msgspec.Struct):
name: str
email: str
age: int
# Valid data - succeeds
valid_json = b'{"name":"Alice","email":"alice@example.com","age":30}'
user = msgspec.json.decode(valid_json, type=User)
# Invalid data - raises ValidationError
invalid_json = b'{"name":"Alice","email":"not-an-email","age":"thirty"}'
try:
user = msgspec.json.decode(invalid_json, type=User)
except msgspec.ValidationError as e:
print(f"Validation failed: {e}")
This means malformed data is caught immediately, not later in your application when it causes harder-to-diagnose bugs.
Nested Structures
msgspec handles nested structs naturally:
import msgspec
class Address(msgspec.Struct):
street: str
city: str
country: str
class Person(msgspec.Struct):
name: str
address: Address
emails: list[str]
person = Person(
name="Bob",
address=Address(street="123 Main St", city="London", country="UK"),
emails=["bob@example.com", "bob.work@example.com"]
)
# Encode and decode
json_data = msgspec.json.encode(person)
decoded = msgspec.json.decode(json_data, type=Person)
Field Options
Control how fields are handled with options:
import msgspec
class Config(msgspec.Struct):
api_key: str # Required
timeout: int = 30 # Optional with default
debug: bool = msgspec.field(default=False, kw_only=True)
# Use keyword arguments for optional fields
config = Config(api_key="secret", timeout=60)
Available field options:
default: Default value if not providedkw_only: Field must be passed as keyword argumentomit_defaults: Skip encoding fields with default values
Type Annotations
msgspec supports most common type annotations:
from typing import Optional, List, Dict
import msgspec
class Event(msgspec.Struct):
id: str
name: str
tags: list[str]
metadata: dict[str, str]
priority: Optional[int] = None
participants: list[str] = msgspec.field(default_factory=list)
Supported types:
- Primitives:
str,int,float,bool,bytes - Collections:
list[T],dict[K, V],set[T] - Optional:
Optional[T]orT | None - Union:
Union[A, B](decoded to first matching type) - Nested structs
Performance Comparison
msgspec consistently outperforms other serialization libraries:
import json
import msgspec
import msgspec.jsonb as jsonb
# Test data
data = {"users": [{"name": f"User{i}", "age": i % 100} for i in range(1000)]}
# JSON (stdlib)
json_encoded = json.dumps(data)
json_decoded = json.loads(json_encoded)
# msgspec JSON
msgspec_encoded = msgspec.json.encode(data)
msgspec_decoded = msgspec.json.decode(msgspec_encoded)
# msgspec JSONB (faster, binary JSON variant)
jsonb_encoded = jsonb.encode(data)
jsonb_decoded = jsonb.decode(jsonb_encoded)
Typical results show msgspec is 2-5x faster than stdlib json for both encoding and decoding, with JSONB providing another significant boost.
Common Patterns
Date and Time
from datetime import datetime, date
import msgspec
class Event(msgspec.Struct):
name: str
start_date: datetime
created_date: date
event = Event(
name="Conference",
start_date=datetime(2026, 3, 20, 9, 0),
created_date=date.today()
)
Enum Support
from enum import Enum
import msgspec
class Status(Enum):
PENDING = "pending"
ACTIVE = "active"
COMPLETED = "completed"
class Task(msgspec.Struct):
name: str
status: Status
task = Task(name="Build feature", status=Status.ACTIVE)
encoded = msgspec.json.encode(task)
When to Use msgspec
Choose msgspec when you need:
- High-throughput serialization (APIs, data pipelines)
- Schema validation without the overhead of full validation frameworks
- Compact binary representation with MessagePack
- Zero-copy decoding for large messages
Stick with JSON or Pydantic when:
- You need extensive validation beyond type checking
- Schema evolution is complex
- You need JSON Schema generation
- Your team is already invested in other tools
See Also
- msgspec documentation — Official library documentation
- MessagePack format — Binary serialization format
jsonmodule — Python’s built-in JSON handling- dataclasses — Built-in Python class definitions