Python Data Structures: Lists, Dicts, Sets, Tuples

Introduction

You already know lists and dicts. This article is about when NOT to use them — and what to reach for instead.

Python's four built-in data structures are tuned for different access patterns. Pick wrong and a millisecond operation becomes minutes at scale. The collections module — defaultdict, Counter, namedtuple, deque — fills the gaps where built-ins fall short. Most people underuse it badly.

I'm organizing this by the problem you're solving, not by data structure. Because nobody thinks "I need a deque today." They think "why is this queue so slow."

Lists and When They Shine

You know this. Ordered, mutable, dynamic arrays under the hood.

Python

# Basic list operationsfruits = ["apple", "banana", "cherry", "date"]
# Append adds to the end -- O(1) amortized
fruits.append("elderberry")
# Insert at a specific position -- O(n) because elements shift
fruits.insert(0, "avocado")
# Remove by value -- O(n) linear scan
fruits.remove("banana")
# Pop from end -- O(1), pop from index -- O(n)
last = fruits.pop()
first = fruits.pop(0)
# Check membership -- O(n) linear scanif"cherry"in fruits:
 print("Found it!")

The performance differences between these operations matter more than most people expect. Appending to the end is cheap — Python over-allocates memory so it doesn't resize on every call. Inserting at the beginning? Every element shifts right. O(n). If you're building a queue with insert(0, item) and pop(), stop. Use deque.

Slicing: More Powerful Than You Think

Full syntax: list[start:stop:step]. All three optional. Most people stop at list[start:stop] and miss the good stuff.

Python

numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# Get every other element
evens = numbers[::2] # [0, 2, 4, 6, 8]
odds = numbers[1::2] # [1, 3, 5, 7, 9]# Reverse a list (creates a new copy)
reversed_nums = numbers[::-1] # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]# Get the last three elements
tail = numbers[-3:] # [7, 8, 9]# Slice assignment -- replace a chunk in place
numbers[2:5] = [20, 30, 40, 50]
print(numbers) # [0, 1, 20, 30, 40, 50, 5, 6, 7, 8, 9]# Delete a slicedel numbers[2:6]
print(numbers) # [0, 1, 5, 6, 7, 8, 9]

Stop writing loops to copy portions of a list. A slice returns a shallow copy in one expression.

List Comprehensions

Optimized at the bytecode level. Faster than equivalent for-loops. And once the syntax clicks, more readable too.

Python

# Transform and filter in one expression
raw_prices = ["$12.50", "$8.99", "invalid", "$45.00", "", "$3.25"]
clean_prices = [
 float(p.replace("$", ""))
 for p in raw_prices
 if p.startswith("$")
]
# Result: [12.5, 8.99, 45.0, 3.25]# Nested comprehension -- flattening a matrix
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [num for row in matrix for num in row]
# Result: [1, 2, 3, 4, 5, 6, 7, 8, 9]# Dictionary comprehension (yes, those exist too)
word_lengths = {word: len(word) for word in ["python", "data", "structures"]}
# Result: {'python': 6, 'data': 4, 'structures': 10}

A word of warning: don't nest three levels deep. If yours spans more than two lines, use a loop. Readability at 2 AM matters more than saving two lines.

Tuples and Immutability

Same as lists but frozen. No appending, no removing, no modifying in place.

Why would you want that constraint? Safety — a tuple holding your database config can't be accidentally modified at runtime. Hashability — tuples work as dictionary keys and set members, lists don't. Performance — slightly faster to create and iterate because Python can optimize the memory layout knowing the size won't change. Three reasons, all good.

Tuple Unpacking

Function returns, loop variables, swap operations. Once you internalize unpacking it shows up everywhere.

Python

# Basic unpacking
coordinates = (40.7128, -74.0060)
lat, lon = coordinates
# Swap without a temp variable
a, b = 1, 2
a, b = b, a # a=2, b=1# Star unpacking -- grab the rest
first, *middle, last = [1, 2, 3, 4, 5]
# first=1, middle=[2, 3, 4], last=5# Unpacking in loops
users = [
 ("alice", "alice@example.com", "admin"),
 ("bob", "bob@example.com", "editor"),
 ("carol", "carol@example.com", "viewer"),
]
for name, email, role in users:
 print(f"{name} ({role}): {email}")
# Returning multiple values from a functiondefget_stats(data):
 returnmin(data), max(data), sum(data) / len(data)
low, high, avg = get_stats([4, 8, 15, 16, 23, 42])

Named Tuples: The Best of Both Worlds

What does point[0] mean? X or y? Nobody remembers. Named tuples fix this — each position gets a name, all the performance stays.

Python

from collections import namedtuple
from typing import NamedTuple
# Classic approach
Point = namedtuple("Point", ["x", "y", "z"])
origin = Point(0, 0, 0)
# Modern approach with type hints (Python 3.6+)classServerConfig(NamedTuple):
 host: str
 port: int
 debug: bool = False
config = ServerConfig(host="localhost", port=8080)
print(config.host) # "localhost"print(config.port) # 8080print(config.debug) # False (default value)# Still works like a regular tuple
host, port, debug = config
print(len(config)) # 3

Good for API response wrappers, config objects, anywhere you'd write a small class with no methods. Lighter than full classes. Type checkers handle them fine. And on a batch pipeline instantiating millions of small objects, switching from dataclasses to named tuples cut memory noticeably — not because dataclasses are slow, but because named tuples carry less per-instance overhead.

Need named fields and mutability? dataclasses (Python 3.7+). Default values, auto-generated __init__, __repr__, __eq__.

Dictionaries for Fast Lookups

Module namespaces? Dictionaries. Class attributes? Dictionaries. JSON parsing output? Dictionaries. Python runs on dictionaries. Hash tables underneath — key lookups, insertions, deletions all O(1) average.

Since Python 3.7, insertion order is guaranteed by the language spec. You almost never need OrderedDict anymore. Some patterns beyond the basics:

Python

# The .get() method prevents KeyError
user_prefs = {"theme": "dark", "language": "en"}
font_size = user_prefs.get("font_size", 14) # Returns 14 (default)# .setdefault() -- get or initialize in one call
categories = {}
for product in [("laptop", "electronics"), ("shirt", "clothing"), ("phone", "electronics")]:
 name, cat = product
 categories.setdefault(cat, []).append(name)
# {'electronics': ['laptop', 'phone'], 'clothing': ['shirt']}# Merge dictionaries (Python 3.9+)
defaults = {"timeout": 30, "retries": 3, "verbose": False}
overrides = {"timeout": 60, "verbose": True}
final_config = defaults | overrides
# {'timeout': 60, 'retries': 3, 'verbose': True}# Dict comprehension with conditional logic
scores = {"Alice": 92, "Bob": 67, "Carol": 85, "Dave": 58, "Eve": 91}
passing = {name: score for name, score in scores.items() if score >= 70}
# {'Alice': 92, 'Carol': 85, 'Eve': 91}

Writing loops with setdefault or manual key-checking to group items? defaultdict is what you want.

The dict | other_dict merge syntax from 3.9 looks clean in two-dict examples. Three or four dicts with conflicting keys? The precedence gets confusing. I still prefer {**defaults, **overrides}. Older style, but the merge order is visually obvious.

Keys must be hashable. Strings, numbers, tuples of hashable items, frozensets — fine. Lists, dicts, sets — TypeError. Hash tables need a stable hash value and mutable objects can't give one.

Sets for Unique Collections

Underused. Criminally so.

Codebases everywhere maintaining uniqueness with lists and if item not in my_list checks. O(n) per check. A set does the same thing in O(1). Internally it's a dictionary without values — just keys. But uniqueness is only half the story. Set operations — intersection, union, difference, symmetric difference — are where sets actually earn their keep.

Python

# Real-world example: permission management
user_permissions = {"read", "write", "delete"}
required_permissions = {"read", "write", "admin"}
# What permissions does the user have that are required?
granted = user_permissions & required_permissions # {'read', 'write'}# What required permissions is the user missing?
missing = required_permissions - user_permissions # {'admin'}# All permissions combined
all_perms = user_permissions | required_permissions # {'read', 'write', 'delete', 'admin'}# Permissions in one set but not both (symmetric difference)
exclusive = user_permissions ^ required_permissions # {'delete', 'admin'}# Check if one set is a subset/superset of anotherprint(user_permissions >= required_permissions) # Falseprint({"read"} <= user_permissions) # True (subset check)# Deduplication while preserving order (Python 3.7+)
items = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3]
unique_ordered = list(dict.fromkeys(items))
# [3, 1, 4, 5, 9, 2, 6]

That permission pattern? Everywhere in real applications. Clean, expressive, constant time regardless of how many permissions you're checking. And sets are mutable, so they can't be dictionary keys. For an immutable version, frozenset. Same operations, read-only after creation.

The Collections Module

Three things from collections come up constantly. The rest you can learn when you need them.

defaultdict: No More KeyError

How many times have you written this?

# The manual way -- checking before inserting
word_count = {}
for word in words:
 if word not in word_count:
 word_count[word] = 0
 word_count[word] += 1

defaultdict takes a factory function that creates a default value for missing keys. Done.

Python

from collections import defaultdict, Counter, deque
# defaultdict with int -- perfect for counting
word_count = defaultdict(int)
for word in"the quick brown fox jumps over the lazy dog the fox".split():
 word_count[word] += 1# defaultdict(int, {'the': 3, 'quick': 1, 'brown': 1, ...})# defaultdict with list -- perfect for grouping
students_by_grade = defaultdict(list)
roster = [("Alice", "A"), ("Bob", "B"), ("Carol", "A"), ("Dave", "C"), ("Eve", "B")]
for name, grade in roster:
 students_by_grade[grade].append(name)
# {'A': ['Alice', 'Carol'], 'B': ['Bob', 'Eve'], 'C': ['Dave']}

Counter: Counting Made Trivial

defaultdict(int) handles counting. Counter goes further — purpose-built for tallying, ships with methods you'd otherwise write by hand.

Python

from collections import Counter
# Count occurrences in any iterable
log_levels = ["INFO", "ERROR", "INFO", "WARNING", "ERROR", "INFO",
 "DEBUG", "ERROR", "INFO", "WARNING"]
counts = Counter(log_levels)
print(counts)
# Counter({'INFO': 4, 'ERROR': 3, 'WARNING': 2, 'DEBUG': 1})# Get the N most common elementsprint(counts.most_common(2))
# [('INFO', 4), ('ERROR', 3)]# Arithmetic with Counters
morning_errors = Counter({"404": 12, "500": 3, "403": 7})
afternoon_errors = Counter({"404": 8, "500": 15, "502": 4})
total = morning_errors + afternoon_errors
# Counter({'404': 20, '500': 18, '403': 7, '502': 4})# Find elements unique to the morning
morning_only = morning_errors - afternoon_errors
# Counter({'404': 4, '403': 7})

Log analysis, data profiling, frequency distributions. The most_common() method alone saves a lot of code compared to sorting a dict by values manually.

deque: The Double-Ended Queue

Need a queue? This is it. O(1) append and pop on both ends. Lists are O(n) at the front — and that's the whole reason deque exists.

Also great for sliding windows, recent-history buffers, BFS. Set maxlen for a fixed-size buffer that drops old items automatically.

from collections import deque
# Fixed-size buffer for recent log entries
recent_logs = deque(maxlen=5)
for i inrange(10):
 recent_logs.append(f"Log entry {i}")
print(list(recent_logs))
# ['Log entry 5', 'Log entry 6', 'Log entry 7', 'Log entry 8', 'Log entry 9']# Use as a queue (FIFO)
task_queue = deque(["task_a", "task_b", "task_c"])
task_queue.append("task_d") # Add to right end
next_task = task_queue.popleft() # Remove from left end -- O(1)!

Choosing the Right Data Structure

Ask in order:

Key-value pairs? Dict. (defaultdict for grouping/counting.)
Uniqueness or membership checks? Set. (frozenset if it needs to be hashable.)
Ordered and immutable? Tuple. (namedtuple if readability matters.)
Queue or double-ended operations? deque.
Everything else? List. It's fine.

Time complexities at a glance:

Operation	List	Tuple	Dict	Set	deque
Index lookup	O(1)	O(1)	N/A	N/A	O(n)
Key/Value lookup	N/A	N/A	O(1)	N/A	N/A
Membership test (`in`)	O(n)	O(n)	O(1)	O(1)	O(n)
Append (end)	O(1)*	N/A	O(1)	O(1)	O(1)
Append (front)	O(n)	N/A	N/A	N/A	O(1)
Pop (end)	O(1)	N/A	N/A	O(1)	O(1)
Pop (front)	O(n)	N/A	N/A	N/A	O(1)
Insert (middle)	O(n)	N/A	O(1)	O(1)	O(n)
Delete (by value)	O(n)	N/A	O(1)	O(1)	O(n)
Sort	O(n log n)	N/A	N/A	N/A	N/A
Memory usage	Medium	Low	High	High	Medium

* Amortized O(1). Occasionally O(n) when the underlying array resizes, but the cost spreads across many operations.

A data pipeline checked each event's user ID against a "seen" list. Tens of thousands of unique users. That if user_id in seen_list scanned the entire list on every event. Swapping the list for a set: processing time from over half an hour to under two minutes. Same logic, same result. This has bitten me more than once.

Validating a JSON payload has the required fields:

required_fields = {"name", "email", "age"}
provided_fields = set(payload.keys())
missing = required_fields - provided_fields
if missing:
 raiseValueError(f"Missing required fields: {missing}")

Cleaner than a loop. Looks identical whether you have 3 required fields or 300.

Think about what you'll do most often. Building a collection and iterating once? A list is fine, even with duplicates. Checking membership repeatedly? Set or dictionary. Always. And stop returning dictionaries just to pass back two values from a function. A tuple or named tuple is clearer, lighter, and what the standard library itself uses — divmod() returns a tuple for exactly this reason.

What I'd Do Differently

Python's built-in data structures handle 95% of cases. For the other 5%, there's sortedcontainers on PyPI and numpy arrays. You'll know when you need them.

Anurag Sinha

Founder & Lead Developer

Anurag is a full stack developer and self-taught programmer from India with over five years of experience building web applications, APIs, and cloud-native systems. He personally researches, writes, tests, and reviews every tutorial on Codertronix.

Python Data Structures: Lists, Dictionaries, Sets and Tuples Explained

Introduction

Lists and When They Shine

Slicing: More Powerful Than You Think

List Comprehensions

Tuples and Immutability

Tuple Unpacking

Named Tuples: The Best of Both Worlds

Dictionaries for Fast Lookups

Sets for Unique Collections

The Collections Module

defaultdict: No More KeyError

Counter: Counting Made Trivial

deque: The Double-Ended Queue

Choosing the Right Data Structure

What I'd Do Differently

Related Tutorials

Django REST Framework: Build Production-Ready Python APIs

Pandas: Python Data Analysis and Manipulation

FastAPI: High-Performance Python APIs Made Simple