lzma module
Overview
lzma is Python’s standard library for LZMA compression — the algorithm behind .xz files. It supports both file-based compression and raw byte compression via compress() and decompress(). LZMA produces smaller files than gzip or bzip2 at the cost of slower compression, making it a good choice for archival or data that needs maximum compression.
Basic Compression and Decompression
import lzma
# Compress bytes
original = b"This is a test message that will be compressed."
compressed = lzma.compress(original)
decompressed = lzma.decompress(compressed)
print(len(original)) # 51 bytes
print(len(compressed)) # 72 bytes (actually larger for tiny input)
print(decompressed == original) # True
For small inputs, compressed output can exceed original size. LZMA shines on larger data.
Compressing Files with LZMAFile
Use LZMAFile as a context manager, similar to gzip:
import lzma
# Write compressed
with lzma.LZMAFile("data.txt.xz", "w") as f:
f.write(b"Hello, world!\n" * 1000)
# Read compressed
with lzma.LZMAFile("data.txt.xz", "r") as f:
content = f.read()
# Read as text
with lzma.LZMAFile("data.txt.xz", "rt") as f:
text = f.read()
LZMAFile accepts the same file modes as built-in open() — r, w, rt, wt, and binary variants.
Reading Existing .xz Files
Python recognizes .xz files automatically when opened with open() if you use the lzma module’s file wrapper:
import lzma
# LZMAFile handles .xz magic bytes
with lzma.open("archive.xz", "r") as f:
data = f.read()
This also works for writing — .xz extension is recognized and the appropriate header is written.
Preset Compression Levels
lzma.compress() and lzma.LZMAFile() accept a preset parameter from 0 (fastest, least compression) to 9 (slowest, most compression):
import lzma
data = b"A" * 100000
fast = lzma.compress(data, preset=0)
medium = lzma.compress(data, preset=6)
best = lzma.compress(data, preset=9)
print(len(fast)) # ~2900 bytes
print(len(medium)) # ~200 bytes
print(len(best)) # ~195 bytes
Default is preset=6. For most production use, preset 6 or 7 strikes a reasonable balance.
Check and Extreme Presets
For maximum compression beyond preset=9, use the extremes format:
import lzma
data = b"A" * 100000
extreme = lzma.compress(data, format=lzma.FORMAT_XZ, preset=9, check=lzma.CHECK_NONE)
# Or with the reserved extremes preset
extreme = lzma.compress(data, preset=9 | lzma.PRESET_EXTREME)
LZMA Format Variants
lzma supports three format types:
| Format | Constant | Use case |
|---|---|---|
FORMAT_XZ | Default for files | Standard .xz format |
FORMAT_ALONE | .lzma (legacy) | Old LZMA Utils format |
FORMAT_RAW | Raw stream | Custom codec use |
import lzma
# Write in legacy .lzma format
with lzma.LZMAFile("data.txt.lzma", "w", format=lzma.FORMAT_ALONE) as f:
f.write(b"some data")
# Read a raw LZMA stream (no file header)
raw = lzma.compress(b"raw bytes", format=lzma.FORMAT_RAW)
Tuning with Filters
Filters give fine-grained control over compression behavior. They’re passed to compress() or LZMAFile as a list:
import lzma
# LZMA2 filter (default for .xz)
filtered = lzma.compress(
b"A" * 100000,
filters=[{"id": lzma.FILTER_LZMA2, "preset": 9}]
)
# Delta filter (useful for images/audio — stores differences)
image_data = open("raw.img", "rb").read()
delta_compressed = lzma.compress(
image_data,
filters=[
{"id": lzma.FILTER_DELTA, "dist": 4},
{"id": lzma.FILTER_LZMA2, "preset": 6},
]
)
The delta filter stores differences between consecutive bytes, which compresses well for images with smooth gradients.
Memory Usage
LZMA’s PRESET_HUGE affects memory consumption during decompression. For systems with limited RAM, lower presets use significantly less memory:
import lzma
# Decompress with limited memory (preset 0 decompression)
with lzma.LZMAFile("data.xz", "r", preset=0) as f:
# Memory use is bounded by the preset level
data = f.read()
Combining with tarfile
For multi-file archives, combine lzma with tarfile:
import tarfile
import lzma
# Create a .tar.xz archive
with tarfile.open("logs.tar.xz", "w:xz") as tar:
tar.add("error.log")
tar.add("access.log")
# Extract
with tarfile.open("logs.tar.xz", "r:xz") as tar:
tar.extractall(path="./extracted")
Error Handling
import lzma
try:
with lzma.LZMAFile("corrupt.xz", "r") as f:
data = f.read()
except lzma.LZMAError as e:
print(f"Corrupt or invalid LZMA data: {e}")
except FileNotFoundError:
print("File not found")
Common exceptions: LZMAError for corrupt data, FileNotFoundError for missing files.
Gotchas
Tiny inputs get bigger. LZMA headers and block structure overhead exceed savings on small inputs. For data under a few hundred bytes, compression may increase size.
Slow compression, fast decompression. LZMA decompression is relatively fast. If you’re compressing once and decompressing many times (like distribution archives), the upfront cost is worth it.
Decompression memory scales with preset. Higher presets require more RAM to decompress. On memory-constrained systems (embedded, small containers), use preset 0-3 for decompression.
.xz and .lzma are different formats. Python’s lzma module writes .xz by default. To write the legacy .lzma format, explicitly pass format=lzma.FORMAT_ALONE.
See Also
- /reference/modules/gzip-module/ — faster but less effective compression
- /guides/python-zlib-compression/ — zlib and brotli compression alternatives
- /tutorials/file-io/ — file handling alongside compression workflows