Hashing with hashlib
When you need to verify that data has not been tampered with, store passwords securely, or create unique identifiers, cryptographic hash functions are the tool for the job. Python’s hashlib module provides a straightforward interface to dozens of hashing algorithms, from the fast and common SHA-256 to stronger algorithms like SHA-3 and BLAKE2.
What Is Hashing?
A hash function takes input of any size and produces a fixed-length output called a digest or hash. Good cryptographic hashes have three key properties:
- Deterministic: The same input always produces the same output
- One-way: You cannot reverse a hash to recover the original data
- Collision-resistant: It is practically impossible to find two different inputs with the same output
Common uses include:
- Verifying file integrity after downloads
- Storing passwords securely (though you should use bcrypt for this)
- Creating checksums for data validation
- Generating unique IDs for caching
Getting Started with hashlib
The basic workflow involves creating a hash object, feeding data into it, and getting the digest:
import hashlib
# Create a hash object using SHA-256
h = hashlib.sha256()
# Feed data into the hash
h.update(b"Hello, ")
# Get the hexadecimal digest
print(h.hexdigest())
# 4ae41387f0190544f2b1ad8ea4a9bdf9c4fd93fb5e7f5ad80e0eb1f2e0d3b2c1
Python 3.8 introduced a convenient one-shot API for simple use cases:
import hashlib
# One-shot: hash data in a single call
result = hashlib.sha256(b"Hello, World!").hexdigest()
print(result)
# dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
Common Hash Algorithms
SHA-256
SHA-256 produces a 256-bit (32-byte) hash and is the most widely used algorithm:
import hashlib
# SHA-256 - recommended for most purposes
sha256_hash = hashlib.sha256(b"your data here").hexdigest()
print(f"SHA-256: {sha256_hash}")
# Get raw bytes instead of hex
sha256_bytes = hashlib.sha256(b"your data here").digest()
print(f"Raw bytes: {sha256_bytes}")
SHA-512
SHA-512 produces a 512-bit hash and is faster on 64-bit systems:
import hashlib
sha512_hash = hashlib.sha512(b"your data here").hexdigest()
print(f"SHA-512: {sha512_hash}")
MD5 and SHA-1
MD5 and SHA-1 are considered cryptographically broken but still useful for non-security purposes like quick checksums:
import hashlib
# MD5 - only use for checksums, not security
md5_hash = hashlib.md5(b"your data here").hexdigest()
print(f"MD5: {md5_hash}")
# SHA-1 - similarly deprecated for security
sha1_hash = hashlib.sha1(b"your data here").hexdigest()
print(f"SHA-1: {sha1_hash}")
BLAKE2
BLAKE2 is faster than SHA-256 and SHA-3 while providing comparable security:
import hashlib
# BLAKE2b - optimized for speed
blake2b_hash = hashlib.blake2b(b"your data here").hexdigest()
print(f"BLAKE2b: {blake2b_hash}")
# BLAKE2s - optimized for 32-bit systems
blake2s_hash = hashlib.blake2s(b"your data here").hexdigest()
print(f"BLAKE2s: {blake2s_hash}")
Hashing Files
For large files, stream the data in chunks to avoid loading the entire file into memory:
import hashlib
def hash_file(filename, algorithm="sha256"):
"""Compute hash of a file using chunked reading."""
hash_obj = hashlib.new(algorithm)
with open(filename, "rb") as f:
# Read in 64KB chunks
for chunk in iter(lambda: f.read(65536), b""):
hash_obj.update(chunk)
return hash_obj.hexdigest()
# Example usage
file_hash = hash_file("myfile.zip", "sha256")
print(f"File SHA-256: {file_hash}")
This approach works with any file size without consuming excessive memory.
Incremental Hashing
When you cannot or do not want to read all data at once, use incremental hashing:
import hashlib
def hash_streaming(data_chunks):
"""Hash data that comes in chunks."""
h = hashlib.sha256()
for chunk in data_chunks:
h.update(chunk)
return h.hexdigest()
# Simulate streaming data
chunks = [b"Hello, ", b"World!", b" More data."]
result = hash_streaming(chunks)
print(f"Streaming hash: {result}")
This pattern is useful when processing network streams or large files.
Key Derivation
For password storage, you need key derivation functions (KDFs) that are intentionally slow to resist brute-force attacks:
import hashlib
import os
def hash_password(password, salt=None):
"""Hash a password with a random salt using PBKDF2."""
if salt is None:
salt = os.urandom(32)
# PBKDF2 with 100,000 iterations
key = hashlib.pbkdf2_hmac(
"sha256",
password.encode("utf-8"),
salt,
100000
)
return salt.hex() + ":" + key.hex()
def verify_password(password, stored_hash):
"""Verify a password against a stored hash."""
salt_hex, key_hex = stored_hash.split(":")
salt = bytes.fromhex(salt_hex)
expected_key = bytes.fromhex(key_hex)
new_key = hashlib.pbkdf2_hmac(
"sha256",
password.encode("utf-8"),
salt,
100000
)
return new_key == expected_key
# Example usage
password = "my_secure_password"
hashed = hash_password(password)
print(f"Stored hash: {hashed}")
# Verify
print(f"Valid password: {verify_password(password, hashed)}") # True
print(f"Wrong password: {verify_password("wrong", hashed)}") # False
For new projects, consider using bcrypt or argon2 instead of PBKDF2.
Using Different Output Lengths
BLAKE2 allows you to specify custom output lengths:
import hashlib
# Shorter hash (16 bytes = 32 hex chars)
short_hash = hashlib.blake2b(b"data", digest_size=16).hexdigest()
print(f"Short BLAKE2b: {short_hash}")
# Longer hash (64 bytes = 128 hex chars)
long_hash = hashlib.blake2b(b"data", digest_size=64).hexdigest()
print(f"Long BLAKE2b: {long_hash}")
This is useful when you need hashes of specific lengths for legacy systems or custom protocols.
Practical Examples
Verifying File Integrity
import hashlib
def verify_file_integrity(filename, expected_hash):
"""Verify a file matches the expected hash."""
actual_hash = hash_file(filename)
return actual_hash == expected_hash
# After downloading a file, verify it
expected = "abc123..." # The hash published by the source
is_valid = verify_file_integrity("download.zip", expected)
print(f"File valid: {is_valid}")
Creating Unique Cache Keys
import hashlib
def cache_key(*args, **kwargs):
"""Generate a cache key from function arguments."""
key_data = str(args) + str(sorted(kwargs.items()))
return hashlib.sha256(key_data.encode()).hexdigest()[:16]
# Generate cache key for a function call
key = cache_key("fetch_data", user_id=123, page=1)
print(f"Cache key: {key}")
Checksumming Data Structures
import hashlib
import json
def checksum(data):
"""Create a checksum of a dictionary or list."""
# Sort keys for consistent ordering
json_str = json.dumps(data, sort_keys=True)
return hashlib.sha256(json_str.encode()).hexdigest()
# Example: verify data has not changed
original = {"name": "Alice", "age": 30}
original_check = checksum(original)
print(f"Original checksum: {original_check}")
# Data modified
original["age"] = 31
modified_check = checksum(original)
print(f"Modified checksum: {modified_check}")
print(f"Data changed: {original_check != modified_check}")
Algorithm Availability
Some algorithms may not be available on all systems. Check what’s available:
import hashlib
# List all available algorithms
available = hashlib.algorithms_available
print(f"Available algorithms: {sorted(available)}")
# List algorithms guaranteed to be available
guaranteed = hashlib.algorithms_guaranteed
print(f"Guaranteed algorithms: {sorted(guaranteed)}")
The guaranteed algorithms include: md5, sha1, sha224, sha256, sha384, sha512, blake2b, blake2s.
See Also
- hashlib-module — The reference documentation for the hashlib module
- json-module — JSON serialization for data structures
- os-module — Operating system randomness via os.urandom for salt generation