Python's Memory Model and Reference Counting
If you have been programming in Python for a while, you might have wondered: how does Python actually store data in memory? Unlike languages like C where you manually manage memory, Python handles this automatically. But understanding what happens under the hood makes you a better programmer—and helps you avoid subtle bugs.
This guide walks you through Python’s memory model, reference counting, and the garbage collector. You will learn why some patterns create memory leaks and how to write code that plays nicely with Python’s memory management.
Everything Is an Object
First, a fundamental truth: in Python, everything is an object. Integers, strings, lists, functions, classes—all of them are objects stored somewhere in memory. Each object has:
- A type that defines what the object can do
- A value representing its data
- A reference count tracking how many places reference it
- A memory address where it lives
When you write:
x = 42
Python creates an integer object with value 42, stores it in memory, and binds the name x to that object. The name is not the object itself—it is just a reference pointing to it.
Reference Counting: The Basics
Python uses reference counting as its primary memory management technique. Every object has a counter that tracks how many references point to it:
- When you create a new reference to an object, its reference count increases
- When you delete a reference, the count decreases
- When the count reaches zero, the object is immediately deallocated
import sys
# Create an object - reference count is at least 1
x = [1, 2, 3]
print(sys.getrefcount(x)) # 2 (one from x, one from the getrefcount call itself)
What Increases Reference Count?
Several operations increase an object’s reference count:
a = [1, 2, 3] # Original reference
b = a # New reference - count increases
c = [a, a] # List containing a twice - count increases by 2
d = a.copy() # New object, separate reference count
When you pass an object to a function, that also creates a temporary reference:
def check_ref(obj):
print(sys.getrefcount(obj)) # Higher than expected due to this call
x = "hello"
check_ref(x)
What Decreases Reference Count?
References are decreased when:
a = [1, 2, 3]
b = a # b also references the list
del a # Removes reference from a (count decreases)
# b still references the list
b = None # Removes reference from b (count goes to 0)
# The list is now deallocated
The Garbage Collector
Reference counting handles most cases, but there is a problem: circular references.
a = []
b = []
a.append(b) # a references b
b.append(a) # b references a
del a
del b
Now neither object can be reached, but neither has a reference count of zero—they reference each other! This is where Python’s garbage collector comes in.
The garbage collector periodically scans for objects that are unreachable due to circular references:
import gc
# Create circular reference
a = []
b = []
a.append(b)
b.append(a)
del a, b # Both still referenced internally
gc.collect() # Forces garbage collection
Python’s gc module provides control over this process:
import gc
# Disable automatic collection (rarely needed)
gc.disable()
# Enable it back
gc.enable()
# Check if automatic collection is enabled
print(gc.isenabled())
# Get statistics
stats = gc.get_stats()
print(stats)
How Objects Are Stored
Python objects are allocated on the heap—a region of memory used for dynamic allocation. The Python interpreter manages this heap internally, not the operating system.
You can see the memory address of an object:
x = 42
print(id(x)) # Memory address as an integer
print(hex(id(x))) # Same address in hex
y = 42
print(id(y)) # Same as id(x) - Python caches small integers
Python caches certain objects automatically:
# Small integers (-5 to 256) are cached
a = 257
b = 257
print(a is b) # False - outside cache range
a = 256
b = 256
print(a is b) # True - within cache range
# Short strings may also be interned
a = "hello"
b = "hello"
print(a is b) # Usually True for simple strings
Common Memory Pitfalls
1. Unintentional Object Retention
Sometimes objects stay in memory because you are holding references to them unintentionally:
# BAD: Module-level list that grows forever
cache = []
def add_to_cache(item):
cache.append(item) # Never cleared!
2. Closures Capturing Variables
Closures can capture variables in unexpected ways:
# Creates functions that remember the list
def create_funcs():
funcs = []
for i in range(3):
funcs.append(lambda: i) # Captures reference to i
return funcs
f = create_funcs()
print([func() for func in f]) # [2, 2, 2] - all reference final i value!
Fix this with default arguments:
def create_funcs():
funcs = []
for i in range(3):
funcs.append(lambda i=i: i) # Captures value, not reference
return funcs
3. Class Attributes vs Instance Attributes
Class attributes are shared across instances:
class Counter:
count = 0 # Class attribute - shared!
def __init__(self):
self.count += 1 # Creates instance attribute, hides class attribute
a = Counter()
b = Counter()
print(a.count, b.count, Counter.count) # 1 1 0
4. Mutable Default Arguments
This classic gotcha is really a memory issue:
def add_item(item, items=[]): # List created once at definition!
items.append(item)
return items
add_item("first")
add_item("second")
print(add_item("third")) # ['first', 'second', 'third']
Use None sentinel instead:
def add_item(item, items=None):
if items is None:
items = []
items.append(item)
return items
Memory Profiling
To write efficient code, you need to measure memory usage:
Using sys.getsizeof
import sys
print(sys.getsizeof(42)) # 28 bytes for an integer
print(sys.getsizeof([])) # 56 bytes for an empty list
print(sys.getsizeof([1, 2, 3])) # More as elements are added
Note that getsizeof only returns the object size, not objects it references:
import sys
# List object is 56 bytes
lst = [1, 2, 3]
print(sys.getsizeof(lst)) # 56 - does not count the integers!
Using memory_profiler
For more detailed analysis:
# pip install memory_profiler
from memory_profiler import profile
@profile
def my_function():
data = [i ** 2 for i in range(10000)]
return data
Using tracemalloc
Python 3.4+ includes tracemalloc for tracking memory allocations:
import tracemalloc
tracemalloc.start()
# Your code here
result = [x ** 2 for x in range(10000)]
# Get statistics
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
for stat in top_stats[:10]:
print(stat)
Writing Memory-Efficient Code
Use slots
When you have many instances of a class, __slots__ saves memory by preventing the creation of __dict__ for each instance:
class Point:
__slots__ = ['x', 'y']
def __init__(self, x, y):
self.x = x
self.y = y
# Without __slots__: ~300+ bytes per instance
# With __slots__: ~100 bytes per instance
Use Generators Instead of Lists
Generators produce values on-demand rather than storing everything in memory:
# BAD: Creates entire list in memory
def get_squares(n):
return [x ** 2 for x in range(n)]
# GOOD: Yields one value at a time
def get_squares_gen(n):
for x in range(n):
yield x ** 2
Delete Unused References
Help the garbage collector by explicitly removing references:
large_data = load_data()
process(large_data)
del large_data # Explicitly remove reference
# Or use context managers
with open('large_file.txt') as f:
data = f.read()
# File automatically closed, memory freed
Weak References
Sometimes you want references that do not prevent garbage collection:
import weakref
class Cache:
def __init__(self):
self._cache = weakref.WeakValueDictionary()
def get(self, key):
return self._cache.get(key)
def set(self, key, value):
self._cache[key] = value
Weak references are useful for caches, observers, and callbacks where you do not want to prevent collection.
Conclusion
Understanding Python’s memory model makes you a more effective programmer:
- Everything is an object with a type, value, and reference count
- Reference counting deallocates objects immediately when count reaches zero
- Garbage collection handles circular references that reference counting cannot
- Common pitfalls include circular references, closures capturing variables, and mutable defaults
- Profiling tools like tracemalloc help identify memory issues
You do not need to think about memory management every day in Python. But when you are building performance-sensitive applications or debugging memory issues, this knowledge is invaluable.
See Also
id()— Get the memory address of an objectsys.getrefcount()— Check how many references exist to an object__slots__— Using slots for memory efficiency in Python classesgcmodule — Control Python’s garbage collectorweakrefmodule — Create references that do not prevent garbage collection