gzip
import gzip The gzip module provides a simple interface for compressing and decompressing files, modeled after the GNU programs gzip and gunzip. It uses the zlib module internally to handle the actual data compression.
The module provides the GzipFile class for reading and writing gzip-format files, along with convenience functions open(), compress(), and decompress() for simpler use cases. Gzip-compressed files are widely used across Unix systems, web servers (for HTTP compression), and data pipelines for reducing storage and transfer costs.
Syntax
import gzip
Functions
open()
Opens a gzip-compressed file in binary or text mode, returning a file object.
Signature: gzip.open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
filename | str or bytes | — | Filename or existing file object |
mode | str | 'rb' | Mode: 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x', 'xb' (binary) or 'rt', 'wt', 'xt' (text) |
compresslevel | int | 9 | Compression level from 0-9 (0=no compression, 9=maximum) |
encoding | str | None | Text encoding (only for text mode) |
errors | str | None | Error handling (only for text mode) |
newline | str | None | Line ending handling (only for text mode) |
Returns: A file object (GzipFile or TextIOWrapper).
Example:
import gzip
# Writing to a compressed file
with gzip.open('data.txt.gz', 'wb') as f:
f.write(b'Hello, World!')
# Reading from a compressed file
with gzip.open('data.txt.gz', 'rb') as f:
content = f.read()
print(content)
# b'Hello, World!'
compress()
Compresses data in memory and returns a bytes object containing the compressed data.
Signature: gzip.compress(data, compresslevel=9, *, mtime=0)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
data | bytes-like | — | The data to compress |
compresslevel | int | 9 | Compression level 0-9 |
mtime | int | 0 | Modification time for the gzip header. Use 0 for reproducible output, None for current time |
Returns: bytes — The compressed data.
Example:
import gzip
data = b'This is a much longer piece of text that we want to compress for storage efficiency.'
compressed = gzip.compress(data)
print(f'Original size: {len(data)} bytes')
print(f'Compressed size: {len(compressed)} bytes')
print(f'Decompressed: {gzip.decompress(compressed)}')
# Original size: 91 bytes
# Compressed size: 65 bytes
# Decompressed: b'This is a much longer piece of text...'
decompress()
Decompresses gzip-compressed data and returns the original uncompressed bytes.
Signature: gzip.decompress(data)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
data | bytes-like | — | The compressed data to decompress |
Returns: bytes — The uncompressed data.
Example:
import gzip
# Compress then decompress
original = b'Binary data here'
compressed = gzip.compress(original)
decompressed = gzip.decompress(compressed)
print(decompressed == original)
# True
# Can also decompress data from file
with gzip.open('data.txt.gz', 'rb') as f:
raw_data = f.read()
decompressed = gzip.decompress(raw_data)
Classes
GzipFile
The GzipFile class provides a file-like interface for reading and writing gzip-compressed files. It simulates most file object methods.
Signature: gzip.GzipFile(filename=None, mode=None, compresslevel=9, fileobj=None, mtime=None)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
filename | str or bytes | None | Filename for the gzip header (or file object to wrap) |
mode | str | None | Mode: 'rb', 'wb', 'ab', etc. |
compresslevel | int | 9 | Compression level 0-9 |
fileobj | file-like | None | Existing file object to wrap |
mtime | int | None | Timestamp for the gzip header (Unix epoch seconds) |
Attributes:
| Attribute | Type | Description |
|---|---|---|
mtime | int or None | Timestamp from the gzip header when decompressing |
name | str or bytes | Path to the gzip file on disk |
mode | str | 'rb' for reading, 'wb' for writing |
Example:
import gzip
from io import BytesIO
# Using GzipFile with BytesIO for in-memory compression
buffer = BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as f:
f.write(b'In-memory compressed data')
# Read back
buffer.seek(0)
with gzip.GzipFile(fileobj=buffer, mode='rb') as f:
print(f.read())
# b'In-memory compressed data'
GzipFile.peek()
Reads uncompressed bytes without advancing the file position.
Signature: GzipFile.peek(n=-1)
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
n | int | -1 | Number of bytes to peek at |
Returns: bytes — The peeked data.
Example:
import gzip
with gzip.open('data.txt.gz', 'rb') as f:
# Peek at the beginning of the file
header = f.peek(10)
print(f'First 10 bytes: {header[:10]}')
# Read normally after peeking
content = f.read()
Common Patterns
Compressing an existing file
import gzip
import shutil
# Compress a file using copyfileobj
with open('large_file.txt', 'rb') as f_in:
with gzip.open('large_file.txt.gz', 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
Reading a large compressed file line by line
import gzip
# Process a large gzipped log file line by line
with gzip.open('access.log.gz', 'rt') as f:
for line in f:
if 'ERROR' in line:
print(line.strip())
Creating a reproducible compressed snapshot
import gzip
data = b'Same data produces same output'
# Using mtime=0 ensures reproducible output (no timestamp in header)
compressed1 = gzip.compress(data, mtime=0)
compressed2 = gzip.compress(data, mtime=0)
print(compressed1 == compressed2)
# True
# Using current time produces different output each time
compressed_now = gzip.compress(data, mtime=None)
Working with web API responses
import gzip
# Many APIs return gzip-compressed responses
response = requests.get('https://api.example.com/data')
if response.headers.get('Content-Encoding') == 'gzip':
compressed_data = response.content
# decompress handles the gzip wrapper
decompressed = gzip.decompress(compressed_data)
Errors
gzip.BadGzipFile— Raised for invalid gzip files (inherits fromOSError). Added in Python 3.8.EOFError— Raised when the file ends unexpectedly.zlib.error— Raised for compression/decompression errors.TypeError— Raised when the input is not bytes-like.FileNotFoundError— Raised when the specified file doesn’t exist.