Memory Management in Python | Python Interview Questions

What You'll Learn

How Python manages memory
Reference counting and garbage collection
Memory profiling and optimization techniques
Weak references for caching
Common memory leaks and how to avoid them

Reference Counting

Python's primary memory management mechanism is reference counting. Each object tracks how many references point to it.

code.pyPython

import sys

# Create object with 1 reference
a = [1, 2, 3]
print(sys.getrefcount(a))  # 2 (a + function argument)

# Add another reference
b = a
print(sys.getrefcount(a))  # 3

# Store in a container
c = [a]
print(sys.getrefcount(a))  # 4

# Remove references
del b
print(sys.getrefcount(a))  # 3

c.pop()
print(sys.getrefcount(a))  # 2

# When refcount hits 0, object is freed immediately

Garbage Collection

Reference counting can't handle circular references. Python's garbage collector (GC) detects and collects these cycles.

code.pyPython

import gc

class Node:
    def __init__(self, name):
        self.name = name
        self.other = None

# Create circular reference
a = Node("A")
b = Node("B")
a.other = b
b.other = a  # Cycle: a → b → a

# Delete external references
del a, b
# Objects still exist! Each has refcount 1 (from the other)

# GC will detect and collect the cycle
collected = gc.collect()
print(f"Collected {collected} objects")

# GC control
gc.disable()              # Disable automatic GC
gc.enable()               # Enable automatic GC
gc.set_threshold(700)     # Tune collection frequency
gc.get_count()            # Objects in each generation

Generational Garbage Collection

Python uses a generational GC with three generations:

code.pyPython

import gc

# Generation 0: New objects (collected most frequently)
# Generation 1: Survived one collection
# Generation 2: Long-lived objects (collected rarely)

print(gc.get_threshold())  # (700, 10, 10)
# 700 allocations triggers gen 0 collection
# 10 gen 0 collections triggers gen 1 collection
# 10 gen 1 collections triggers gen 2 collection

# Check generations
print(gc.get_count())  # (current_gen0, current_gen1, current_gen2)

Memory Profiling

code.pyPython

import tracemalloc
import sys

# Start tracing
tracemalloc.start()

# Your code here
data = [i ** 2 for i in range(100000)]
more_data = {i: i * 2 for i in range(50000)}

# Get memory usage
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.2f} MB")
print(f"Peak: {peak / 1024 / 1024:.2f} MB")

# Get top allocations
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("Top 5 memory allocations:")
for stat in top_stats[:5]:
    print(stat)

tracemalloc.stop()

# Check object size
print(f"List size: {sys.getsizeof(data)} bytes")
print(f"Dict size: {sys.getsizeof(more_data)} bytes")

Memory Optimization Techniques

Use slots

code.pyPython

import sys

class PointRegular:
    def __init__(self, x, y):
        self.x = x
        self.y = y

class PointSlots:
    __slots__ = ['x', 'y']

    def __init__(self, x, y):
        self.x = x
        self.y = y

# Compare memory usage
regular = PointRegular(1, 2)
slotted = PointSlots(1, 2)

print(f"Regular: {sys.getsizeof(regular.__dict__)} bytes (plus object)")
# slotted has no __dict__

# ~40-50% memory savings per object
points = [PointSlots(i, i) for i in range(100000)]  # Much less memory

Use Generators for Large Data

code.pyPython

# Bad: Loads everything into memory
def get_all_lines_list(filename):
    return [process(line) for line in open(filename)]

# Good: Processes one line at a time
def get_all_lines_generator(filename):
    with open(filename) as f:
        for line in f:
            yield process(line)

# Memory-efficient iteration
for processed_line in get_all_lines_generator("huge_file.txt"):
    handle(processed_line)

Use Efficient Data Types

code.pyPython

from array import array
import sys

# List of ints
list_nums = list(range(1000000))
print(f"List: {sys.getsizeof(list_nums) / 1024 / 1024:.2f} MB")

# Array of ints (much smaller)
array_nums = array('i', range(1000000))
print(f"Array: {sys.getsizeof(array_nums) / 1024 / 1024:.2f} MB")

# For numerical work, use NumPy
import numpy as np
np_nums = np.arange(1000000, dtype=np.int32)
print(f"NumPy: {np_nums.nbytes / 1024 / 1024:.2f} MB")

Weak References

Weak references don't prevent garbage collection — useful for caches.

code.pyPython

import weakref

class ExpensiveObject:
    def __init__(self, value):
        self.value = value

# Create a weak reference
obj = ExpensiveObject(42)
weak_ref = weakref.ref(obj)

print(weak_ref())       # <ExpensiveObject object>
print(weak_ref().value)  # 42

del obj
print(weak_ref())  # None (object was garbage collected)

# WeakValueDictionary for caches
cache = weakref.WeakValueDictionary()

def expensive_computation(key):
    result = ExpensiveObject(key * 1000)
    cache[key] = result
    return result

# Objects in cache don't prevent collection when no other refs exist

Common Memory Leaks

code.pyPython

# 1. Circular references with __del__
class Leaky:
    def __init__(self, other=None):
        self.other = other

    def __del__(self):
        print("Cleaning up")

a = Leaky()
b = Leaky(a)
a.other = b  # Cycle with __del__ - harder to collect

# 2. Global lists that grow forever
results = []  # Global!
def process(data):
    results.append(compute(data))  # Never cleaned up

# 3. Forgotten closures
def create_processor():
    huge_data = load_huge_dataset()  # Captured by closure!

    def process(x):
        return x in huge_data

    return process

Interview Tip

When asked about memory management:

Reference counting for immediate cleanup, GC for cycles
slots saves ~40% memory per object
Use generators for large datasets
tracemalloc for memory profiling
WeakValueDictionary for caches without preventing GC