Compressing Data with zlib

· 4 min read · Updated March 14, 2026 · beginner
python stdlib compression data-processing

The zlib module is Python’s interface to the zlib compression library, which implements the DEFLATE algorithm. It’s the foundation for many other compression formats you’ll encounter in Python, including gzip and zip files. Understanding zlib gives you low-level control over compression for data storage, network transmission, or memory optimization.

Why Use zlib?

The zlib library is everywhere. It’s fast, portable, and produces good compression ratios for text data. When you need to compress data in memory—before saving to a file, sending over a network, or storing in a database—zlib is often the right tool.

Common use cases include:

  • Compressing data before saving to disk or database
  • Reducing network payload sizes for API requests
  • Storing cached data more efficiently
  • Working with formats that use zlib internally (gzip, zip, PNG, PDF)

Basic Compression and Decompression

The zlib module provides the core compress() and decompress() functions for in-memory operations:

import zlib

# Compress some data
original_data = b"This is a string of text that we want to compress."
compressed = zlib.compress(original_data)

print(f"Original size: {len(original_data)} bytes")
print(f"Compressed size: {len(compressed)} bytes")
print(f"Compression ratio: {len(compressed) / len(original_data):.2%}")
# Original size: 52 bytes
# Compressed size: 46 bytes
# Compression ratio: 88.46%

Text with repetition compresses better than random data:

import zlib

# Repetitive text compresses well
repetitive = b"AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD" * 10
compressed = zlib.compress(repetitive)

print(f"Original: {len(repetitive)} bytes")
print(f"Compressed: {len(compressed)} bytes")
# Original: 320 bytes
# Compressed: 28 bytes

Decompressing is straightforward:

import zlib

data = b"Some data to decompress"
compressed = zlib.compress(data)
decompressed = zlib.decompress(compressed)

print(decompressed)  # b"Some data to decompress"
print(decompressed == data)  # True

Compression Levels

The compresslevel parameter controls the trade-off between speed and compression ratio. Valid values are 0 (no compression) through 9 (maximum compression):

import zlib

data = b"The quick brown fox jumps over the lazy dog. " * 50

for level in [0, 1, 5, 9]:
    compressed = zlib.compress(data, level)
    print(f"Level {level}: {len(compressed)} bytes ({len(compressed)/len(data):.1%})")

Typical output:

Level 0: 1700 bytes (100.0%)
Level 1: 164 bytes (9.6%)
Level 5: 152 bytes (8.9%)
Level 9: 150 bytes (8.8%)

Level 1 is fast but provides good compression. Level 9 is slower but only marginally better. For most applications, level 6 (the default) or level 1 offers the best balance.

Working with Streams

For large data or streaming scenarios, use Compressor and Decompressor objects:

import zlib

# Compress data in chunks
compressor = zlib.compressobj(level=6)

chunk1 = b"First part of the data..."
chunk2 = b"Second part with more content."
chunk3 = b"Third and final part."

compressed_chunks = []
compressed_chunks.append(compressor.compress(chunk1))
compressed_chunks.append(compressor.compress(chunk2))
compressed_chunks.append(compressor.flush())

compressed_data = b"".join(compressed_chunks)
print(f"Compressed: {len(compressed_data)} bytes")

Decompressing streams works similarly:

import zlib

decompressor = zlib.decompressobj()

# Decompress in chunks
result_chunks = []
result_chunks.append(decompressor.decompress(compressed_data))
result_chunks.append(decompressor.flush())

decompressed = b"".join(result_chunks)
print(decompressed)

This pattern is essential when working with network streams or files that don’t fit in memory.

Practical Examples

Caching Compressed Data

import zlib
import pickle

def cache_compressed(cache, key, value, level=6):
    """Store compressed data in cache."""
    serialized = pickle.dumps(value)
    compressed = zlib.compress(serialized, level)
    cache[key] = compressed

def load_compressed(cache, key):
    """Retrieve and decompress data from cache."""
    compressed = cache.get(key)
    if compressed is None:
        return None
    serialized = zlib.decompress(compressed)
    return pickle.loads(serialized)

# Example usage (with a dict as cache)
my_cache = {}
cache_compressed(my_cache, "results", {"data": [1, 2, 3, 4, 5]})
result = load_compressed(my_cache, "results")
print(result)  # {'data': [1, 2, 3, 4, 5]}

Compressing Before Network Transmission

import zlib
import json

def compress_payload(data):
    """Compress JSON data for network transmission."""
    json_data = json.dumps(data).encode("utf-8")
    compressed = zlib.compress(json_data, level=6)
    return compressed

def decompress_payload(compressed_data):
    """Decompress received data."""
    json_data = zlib.decompress(compressed_data)
    return json.loads(json_data)

# Example
payload = {"messages": ["hello", "world"] * 100}
compressed = compress_payload(payload)
print(f"Sent {len(compressed)} bytes instead of {len(json.dumps(payload))} bytes")

received = decompress_payload(compressed)
print(received == payload)  # True

Creating zlib-Wrapped Protocol Messages

import zlib
import struct

def create_message(payload):
    """Create a zlib-compressed message with length prefix."""
    compressed = zlib.compress(payload, level=6)
    length = struct.pack(">I", len(compressed))
    return length + compressed

def read_message(data):
    """Read a length-prefixed zlib-compressed message."""
    length = struct.unpack(">I", data[:4])[0]
    compressed = data[4:4+length]
    return zlib.decompress(compressed)

# Example
message = create_message(b"Important data that needs compression")
print(f"Message length: {len(message)}")
payload = read_message(message)
print(payload)  # b"Important data that needs compression"

Handling Errors

Decompression can fail if the data is corrupted or was never compressed:

import zlib

# Try to decompress invalid data
try:
    zlib.decompress(b"not compressed data")
except zlib.error as e:
    print(f"Decompression failed: {e}")

# Check if data looks compressed before decompressing
def safe_decompress(data):
    """Decompress with error handling."""
    try:
        return zlib.decompress(data)
    except zlib.error:
        return None

result = safe_decompress(b"invalid data")
print(result)  # None

You can also use decompressobj() to handle partial or streaming data more gracefully.

Computing Checksums

zlib provides CRC-32 checksums for data integrity:

import zlib

data = b"Hello, World!"

# Calculate CRC-32 checksum
crc = zlib.crc32(data)
print(f"CRC-32: {crc}")

# Using adler32 (faster but less accurate)
adler = zlib.adler32(data)
print(f"Adler-32: {adler}")

The adler32 checksum is faster but less reliable for detecting errors. Use CRC-32 when accuracy matters more than speed.

See Also