#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
8 min read
•Question 31 of 41hard

Memory Management in Python

Understanding memory allocation and garbage collection.

What You'll Learn

  • How Python manages memory
  • Reference counting and garbage collection
  • Memory profiling and optimization techniques
  • Weak references for caching
  • Common memory leaks and how to avoid them

Reference Counting

Python's primary memory management mechanism is reference counting. Each object tracks how many references point to it.

code.pyPython
import sys

# Create object with 1 reference
a = [1, 2, 3]
print(sys.getrefcount(a))  # 2 (a + function argument)

# Add another reference
b = a
print(sys.getrefcount(a))  # 3

# Store in a container
c = [a]
print(sys.getrefcount(a))  # 4

# Remove references
del b
print(sys.getrefcount(a))  # 3

c.pop()
print(sys.getrefcount(a))  # 2

# When refcount hits 0, object is freed immediately

Garbage Collection

Reference counting can't handle circular references. Python's garbage collector (GC) detects and collects these cycles.

code.pyPython
import gc

class Node:
    def __init__(self, name):
        self.name = name
        self.other = None

# Create circular reference
a = Node("A")
b = Node("B")
a.other = b
b.other = a  # Cycle: a → b → a

# Delete external references
del a, b
# Objects still exist! Each has refcount 1 (from the other)

# GC will detect and collect the cycle
collected = gc.collect()
print(f"Collected {collected} objects")

# GC control
gc.disable()              # Disable automatic GC
gc.enable()               # Enable automatic GC
gc.set_threshold(700)     # Tune collection frequency
gc.get_count()            # Objects in each generation

Generational Garbage Collection

Python uses a generational GC with three generations:

code.pyPython
import gc

# Generation 0: New objects (collected most frequently)
# Generation 1: Survived one collection
# Generation 2: Long-lived objects (collected rarely)

print(gc.get_threshold())  # (700, 10, 10)
# 700 allocations triggers gen 0 collection
# 10 gen 0 collections triggers gen 1 collection
# 10 gen 1 collections triggers gen 2 collection

# Check generations
print(gc.get_count())  # (current_gen0, current_gen1, current_gen2)

Memory Profiling

code.pyPython
import tracemalloc
import sys

# Start tracing
tracemalloc.start()

# Your code here
data = [i ** 2 for i in range(100000)]
more_data = {i: i * 2 for i in range(50000)}

# Get memory usage
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.2f} MB")
print(f"Peak: {peak / 1024 / 1024:.2f} MB")

# Get top allocations
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("Top 5 memory allocations:")
for stat in top_stats[:5]:
    print(stat)

tracemalloc.stop()

# Check object size
print(f"List size: {sys.getsizeof(data)} bytes")
print(f"Dict size: {sys.getsizeof(more_data)} bytes")

Memory Optimization Techniques

Use slots

code.pyPython
import sys

class PointRegular:
    def __init__(self, x, y):
        self.x = x
        self.y = y

class PointSlots:
    __slots__ = ['x', 'y']

    def __init__(self, x, y):
        self.x = x
        self.y = y

# Compare memory usage
regular = PointRegular(1, 2)
slotted = PointSlots(1, 2)

print(f"Regular: {sys.getsizeof(regular.__dict__)} bytes (plus object)")
# slotted has no __dict__

# ~40-50% memory savings per object
points = [PointSlots(i, i) for i in range(100000)]  # Much less memory

Use Generators for Large Data

code.pyPython
# Bad: Loads everything into memory
def get_all_lines_list(filename):
    return [process(line) for line in open(filename)]

# Good: Processes one line at a time
def get_all_lines_generator(filename):
    with open(filename) as f:
        for line in f:
            yield process(line)

# Memory-efficient iteration
for processed_line in get_all_lines_generator("huge_file.txt"):
    handle(processed_line)

Use Efficient Data Types

code.pyPython
from array import array
import sys

# List of ints
list_nums = list(range(1000000))
print(f"List: {sys.getsizeof(list_nums) / 1024 / 1024:.2f} MB")

# Array of ints (much smaller)
array_nums = array('i', range(1000000))
print(f"Array: {sys.getsizeof(array_nums) / 1024 / 1024:.2f} MB")

# For numerical work, use NumPy
import numpy as np
np_nums = np.arange(1000000, dtype=np.int32)
print(f"NumPy: {np_nums.nbytes / 1024 / 1024:.2f} MB")

Weak References

Weak references don't prevent garbage collection — useful for caches.

code.pyPython
import weakref

class ExpensiveObject:
    def __init__(self, value):
        self.value = value

# Create a weak reference
obj = ExpensiveObject(42)
weak_ref = weakref.ref(obj)

print(weak_ref())       # <ExpensiveObject object>
print(weak_ref().value)  # 42

del obj
print(weak_ref())  # None (object was garbage collected)

# WeakValueDictionary for caches
cache = weakref.WeakValueDictionary()

def expensive_computation(key):
    result = ExpensiveObject(key * 1000)
    cache[key] = result
    return result

# Objects in cache don't prevent collection when no other refs exist

Common Memory Leaks

code.pyPython
# 1. Circular references with __del__
class Leaky:
    def __init__(self, other=None):
        self.other = other

    def __del__(self):
        print("Cleaning up")

a = Leaky()
b = Leaky(a)
a.other = b  # Cycle with __del__ - harder to collect

# 2. Global lists that grow forever
results = []  # Global!
def process(data):
    results.append(compute(data))  # Never cleaned up

# 3. Forgotten closures
def create_processor():
    huge_data = load_huge_dataset()  # Captured by closure!

    def process(x):
        return x in huge_data

    return process

Interview Tip

When asked about memory management:

  1. Reference counting for immediate cleanup, GC for cycles
  2. slots saves ~40% memory per object
  3. Use generators for large datasets
  4. tracemalloc for memory profiling
  5. WeakValueDictionary for caches without preventing GC