8 min read
ā¢Question 31 of 41hardMemory Management in Python
Understanding memory allocation and garbage collection.
What You'll Learn
- How Python manages memory
- Reference counting and garbage collection
- Memory profiling and optimization techniques
- Weak references for caching
- Common memory leaks and how to avoid them
Reference Counting
Python's primary memory management mechanism is reference counting. Each object tracks how many references point to it.
code.pyPython
import sys
# Create object with 1 reference
a = [1, 2, 3]
print(sys.getrefcount(a)) # 2 (a + function argument)
# Add another reference
b = a
print(sys.getrefcount(a)) # 3
# Store in a container
c = [a]
print(sys.getrefcount(a)) # 4
# Remove references
del b
print(sys.getrefcount(a)) # 3
c.pop()
print(sys.getrefcount(a)) # 2
# When refcount hits 0, object is freed immediatelyGarbage Collection
Reference counting can't handle circular references. Python's garbage collector (GC) detects and collects these cycles.
code.pyPython
import gc
class Node:
def __init__(self, name):
self.name = name
self.other = None
# Create circular reference
a = Node("A")
b = Node("B")
a.other = b
b.other = a # Cycle: a ā b ā a
# Delete external references
del a, b
# Objects still exist! Each has refcount 1 (from the other)
# GC will detect and collect the cycle
collected = gc.collect()
print(f"Collected {collected} objects")
# GC control
gc.disable() # Disable automatic GC
gc.enable() # Enable automatic GC
gc.set_threshold(700) # Tune collection frequency
gc.get_count() # Objects in each generationGenerational Garbage Collection
Python uses a generational GC with three generations:
code.pyPython
import gc
# Generation 0: New objects (collected most frequently)
# Generation 1: Survived one collection
# Generation 2: Long-lived objects (collected rarely)
print(gc.get_threshold()) # (700, 10, 10)
# 700 allocations triggers gen 0 collection
# 10 gen 0 collections triggers gen 1 collection
# 10 gen 1 collections triggers gen 2 collection
# Check generations
print(gc.get_count()) # (current_gen0, current_gen1, current_gen2)Memory Profiling
code.pyPython
import tracemalloc
import sys
# Start tracing
tracemalloc.start()
# Your code here
data = [i ** 2 for i in range(100000)]
more_data = {i: i * 2 for i in range(50000)}
# Get memory usage
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.2f} MB")
print(f"Peak: {peak / 1024 / 1024:.2f} MB")
# Get top allocations
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
print("Top 5 memory allocations:")
for stat in top_stats[:5]:
print(stat)
tracemalloc.stop()
# Check object size
print(f"List size: {sys.getsizeof(data)} bytes")
print(f"Dict size: {sys.getsizeof(more_data)} bytes")Memory Optimization Techniques
Use slots
code.pyPython
import sys
class PointRegular:
def __init__(self, x, y):
self.x = x
self.y = y
class PointSlots:
__slots__ = ['x', 'y']
def __init__(self, x, y):
self.x = x
self.y = y
# Compare memory usage
regular = PointRegular(1, 2)
slotted = PointSlots(1, 2)
print(f"Regular: {sys.getsizeof(regular.__dict__)} bytes (plus object)")
# slotted has no __dict__
# ~40-50% memory savings per object
points = [PointSlots(i, i) for i in range(100000)] # Much less memoryUse Generators for Large Data
code.pyPython
# Bad: Loads everything into memory
def get_all_lines_list(filename):
return [process(line) for line in open(filename)]
# Good: Processes one line at a time
def get_all_lines_generator(filename):
with open(filename) as f:
for line in f:
yield process(line)
# Memory-efficient iteration
for processed_line in get_all_lines_generator("huge_file.txt"):
handle(processed_line)Use Efficient Data Types
code.pyPython
from array import array
import sys
# List of ints
list_nums = list(range(1000000))
print(f"List: {sys.getsizeof(list_nums) / 1024 / 1024:.2f} MB")
# Array of ints (much smaller)
array_nums = array('i', range(1000000))
print(f"Array: {sys.getsizeof(array_nums) / 1024 / 1024:.2f} MB")
# For numerical work, use NumPy
import numpy as np
np_nums = np.arange(1000000, dtype=np.int32)
print(f"NumPy: {np_nums.nbytes / 1024 / 1024:.2f} MB")Weak References
Weak references don't prevent garbage collection ā useful for caches.
code.pyPython
import weakref
class ExpensiveObject:
def __init__(self, value):
self.value = value
# Create a weak reference
obj = ExpensiveObject(42)
weak_ref = weakref.ref(obj)
print(weak_ref()) # <ExpensiveObject object>
print(weak_ref().value) # 42
del obj
print(weak_ref()) # None (object was garbage collected)
# WeakValueDictionary for caches
cache = weakref.WeakValueDictionary()
def expensive_computation(key):
result = ExpensiveObject(key * 1000)
cache[key] = result
return result
# Objects in cache don't prevent collection when no other refs existCommon Memory Leaks
code.pyPython
# 1. Circular references with __del__
class Leaky:
def __init__(self, other=None):
self.other = other
def __del__(self):
print("Cleaning up")
a = Leaky()
b = Leaky(a)
a.other = b # Cycle with __del__ - harder to collect
# 2. Global lists that grow forever
results = [] # Global!
def process(data):
results.append(compute(data)) # Never cleaned up
# 3. Forgotten closures
def create_processor():
huge_data = load_huge_dataset() # Captured by closure!
def process(x):
return x in huge_data
return processInterview Tip
When asked about memory management:
- Reference counting for immediate cleanup, GC for cycles
- slots saves ~40% memory per object
- Use generators for large datasets
- tracemalloc for memory profiling
- WeakValueDictionary for caches without preventing GC