Threading in Python
Python’s threading module lets you run multiple tasks at the same time. Threads are useful when your program needs to wait for external things—like network responses, file reads, or user input—while doing other work. This guide shows you how to create threads, synchronize them safely, and avoid common pitfalls.
When to Use Threads
Threads shine for I/O-bound tasks: downloading files, calling APIs, reading from disks, or waiting for user input. While one thread waits for data, others can keep your program busy.
For CPU-bound work—number crunching, image processing, machine learning—threads are less effective because of Python’s Global Interpreter Lock (GIL). The GIL prevents multiple threads from running Python bytecode simultaneously. If you’re doing heavy computation, consider multiprocessing instead.
Creating Threads
The simplest way to create a thread is with the Thread class:
import threading
import time
def download_file(filename):
print(f"Starting download: {filename}")
time.sleep(2) # Simulate I/O work
print(f"Finished: {filename}")
# Create threads
thread1 = threading.Thread(target=download_file, args=("data.csv",))
thread2 = threading.Thread(target=download_file, args=("image.png",))
# Start them
thread1.start()
thread2.start()
# Wait for both to finish
thread1.join()
thread2.join()
print("All downloads complete")
Output:
Starting download: data.csv
Starting download: image.png
Finished: data.csv
Finished: image.png
All downloads complete
Both downloads run in parallel, cutting the total time in half.
Subclassing Thread
For more control, subclass Thread and override the run() method:
import threading
import time
class DownloadTask(threading.Thread):
def __init__(self, filename):
super().__init__()
self.filename = filename
def run(self):
print(f"Downloading {self.filename}")
time.sleep(2)
print(f"Done: {self.filename}")
# Use it
task = DownloadTask("report.pdf")
task.start()
task.join()
This pattern works well when each thread needs its own state or behavior.
Sharing Data Between Threads
When multiple threads access the same data, you need synchronization. Without it, race conditions can corrupt data:
import threading
counter = 0
def increment():
global counter
for _ in range(1000000):
counter += 1
threads = [threading.Thread(target=increment) for _ in range(4)]
for t in threads:
t.start()
for t in threads:
t.join()
print(counter) # Likely less than 4000000!
The counter ends up wrong because counter += 1 isn’t atomic—it involves reading, adding, and writing. Threads can interleave between these steps.
Locks: Protecting Shared Data
Use a Lock to ensure only one thread accesses a resource at a time:
import threading
counter = 0
lock = threading.Lock()
def increment():
global counter
for _ in range(1000000):
with lock:
counter += 1
threads = [threading.Thread(target=increment) for _ in range(4)]
for t in threads:
t.start()
for t in threads:
t.join()
print(counter) # Exactly 4000000
The with lock: statement acquires the lock before the critical section and releases it afterward—even if an exception occurs.
Other Synchronization Primitives
The threading module provides several synchronization tools:
- RLock: A reentrant lock that the same thread can acquire multiple times
- Condition: Wait for specific conditions with
wait()andnotify() - Semaphore: Limits access to a fixed number of resources
- Event: One thread signals, others wait
- Barrier: Threads wait for each other at a synchronization point
# Example: Barrier for phased processing
import threading
def process_phase(barrier, phase):
print(f"Phase {phase} starting")
barrier.wait() # Wait for all threads
print(f"Phase {phase} complete")
barrier = threading.Barrier(3)
threads = [
threading.Thread(target=process_phase, args=(barrier, i))
for i in range(1, 4)
]
for t in threads:
t.start()
for t in threads:
t.join()
Thread-Safe Queues
The queue module provides thread-safe queues for passing data between threads:
import threading
import queue
import time
def producer(q):
for i in range(5):
time.sleep(0.5)
q.put(i)
print(f"Produced: {i}")
def consumer(q):
while True:
item = q.get()
if item is None: # Poison pill
break
print(f"Consumed: {item}")
q.task_done()
q = queue.Queue()
producer_thread = threading.Thread(target=producer, args=(q,))
consumer_thread = threading.Thread(target=consumer, args=(q,))
producer_thread.start()
consumer_thread.start()
producer_thread.join()
q.put(None) # Poison pill to stop consumer
consumer_thread.join()
The queue handles all synchronization internally—you don’t need locks for basic put/get operations.
Daemon Threads
Set a thread as a daemon to let the program exit without waiting for it:
thread = threading.Thread(target=background_task, daemon=True)
thread.start()
# Program exits even if thread is still running
Daemon threads are useful for monitoring or cleanup tasks that shouldn’t block shutdown.
Best Practices
- Keep threads focused: Each thread should have a clear purpose
- Avoid excessive threads: Too many threads add overhead; a few dozen is usually plenty
- Always join or detach: Don’t leave threads dangling
- Use queues for communication: They’re safer than shared state
- Handle exceptions in threads: Uncaught exceptions can silently kill threads
- Consider higher-level APIs:
concurrent.futures.ThreadPoolExecutormanages a pool of workers for you
from concurrent.futures import ThreadPoolExecutor
def fetch_url(url):
# Simulate work
return f"Result from {url}"
with ThreadPoolExecutor(max_workers=4) as executor:
results = executor.map(fetch_url, ["url1", "url2", "url3"])
for result in results:
print(result)
The executor handles thread creation, reuse, and cleanup automatically.
Common Pitfalls
- Forgetting to join: Threads won’t clean up properly
- Deadlocks: Two threads waiting on each other’s locks
- Race conditions: Unsynchronized access to shared data
- GIL misconceptions: Threads won’t speed up CPU-bound work
- Modifying globals carelessly: Use locks or thread-local storage
See Also
threadingmodule — Full module referencequeuemodule — Thread-safe queuesconcurrent.futures— High-level thread pool APImultiprocessing— Process-based parallelism for CPU-bound tasks