Python Loops Mastered: for, while, range, enumerate, and zip

If conditionals are the nervous system of your program, loops are its heartbeat. They let you repeat operations over collections of data, automate tedious tasks, and write code once that works on datasets of any size. Without loops, you'd have to write out every single operation manually, no way to scale, no way to handle real-world data.
Think about what loops actually enable. Imagine you have a list of ten thousand customer records and you need to apply a discount to every one of them. Without a loop, you would literally need ten thousand individual lines of code. With a loop, you write three. That compression of intent is what makes loops so fundamental, they let you express "do this for every item" in a way that mirrors how humans actually think about repetitive work. And in AI and machine learning specifically, loops are everywhere: iterating over training batches, running gradient descent steps, evaluating model predictions, preprocessing datasets. Mastering loops now means you'll be comfortable with the patterns you encounter constantly in NumPy, PyTorch, TensorFlow, and scikit-learn later in this series.
In this article, we're covering Python's looping arsenal: the versatile for loop, the condition-based while loop, the range() function that powers iteration, enumerate() for getting indexes alongside values, and zip() for parallel iteration. We'll explore nested loops, comprehensions as loop alternatives, and advanced patterns with itertools. We'll also dig into the iterator protocol that makes all of this work under the hood, common mistakes that trip up even experienced developers, and performance considerations that matter when your datasets grow from hundreds to millions of rows. By the end, you'll understand not just how to loop, but when to use each tool and why for loops are almost always your first choice over while.
Table of Contents
- The for Loop: Your Workhorse
- Basic Iteration Over a List
- Iterating Over Strings
- Iterating Over Dictionaries
- Understanding range()
- Basic range() Usage
- range() with Start and Stop
- range() with Step
- Negative Step for Counting Down
- Why range() Matters
- enumerate(): Pairing Indexes with Values
- zip(): Parallel Iteration
- Zipping Multiple Sequences
- The while Loop: Condition-Based Iteration
- A Practical while Example: User Input
- break: Exit the Loop Early
- continue: Skip to the Next Iteration
- The else Clause on Loops
- Nested Loops
- Real-World Example: Matrix Operations
- List Comprehensions: The Loop Alternative
- Dict and Set Comprehensions
- Understanding Loop Scope and Variables
- Looping with Unpacking
- Unpacking Tuples and Lists
- Extended Unpacking with \*
- Loop Internals and Iterator Protocol
- Advanced: itertools Patterns
- itertools.product()
- itertools.chain()
- itertools.islice()
- Other Useful itertools: count() and repeat()
- for vs while: A Performance Comparison
- Generator Expressions: Lazy Evaluation
- Performance Considerations
- Filtering and Validation Loops
- Filtering Valid Items
- Separating Into Categories
- Counting with a Default
- Finding Duplicates
- Common Loop Mistakes
- Real-World Loop Patterns
- Processing Pairs of Consecutive Elements
- Filtering and Transforming in One Pass
- Early Exit with a Sentinel
- Accumulation Patterns
- Choosing the Right Loop Construct
- When to Use for
- When to Use while
- When to Use List Comprehensions
- When to Use Generators
- Summary
The for Loop: Your Workhorse
The for loop is your primary tool for iterating over sequences. It's clean, readable, and works on anything iterable: lists, strings, tuples, dictionaries, sets, or anything with a __iter__ method.
Basic Iteration Over a List
Here's the simplest form, notice how the loop variable fruit takes on each value from the list one at a time:
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
print(fruit)Expected output:
apple
banana
cherry
The syntax is straightforward: for item in iterable: followed by an indented block. Python assigns each element to fruit one at a time, executes the block, then moves to the next. What makes this elegant is that you're expressing your intent directly, "for each fruit in fruits, print it", rather than managing indexes and length checks yourself. Simple and elegant.
Iterating Over Strings
Strings are iterables too. When you loop over a string, you get individual characters, this is useful in text processing tasks and forms the basis of many natural language processing operations:
word = "Python"
for letter in word:
print(letter)Expected output:
P
y
t
h
o
n
This is handy for character-by-character processing. Want to count vowels?
word = "Python"
vowel_count = 0
for letter in word:
if letter.lower() in "aeiou":
vowel_count += 1
print(f"Vowels in '{word}': {vowel_count}")Expected output:
Vowels in 'Python': 1
Notice that we used letter.lower() to handle both uppercase and lowercase vowels in a single check. This kind of defensive coding, anticipating variations in the data, is a habit worth building from the start, because real-world text is rarely clean.
Iterating Over Dictionaries
With dictionaries, a bare for loop gives you the keys. This is a common source of confusion for newcomers, because dictionaries hold key-value pairs, but by default Python exposes only the keys during iteration:
person = {"name": "Alice", "age": 30, "city": "Portland"}
for key in person:
print(key)Expected output:
name
age
city
To get both keys and values, use .items():
person = {"name": "Alice", "age": 30, "city": "Portland"}
for key, value in person.items():
print(f"{key}: {value}")Expected output:
name: Alice
age: 30
city: Portland
This is the pythonic way. Using .keys() and then indexing back into the dict is redundant. The .items() method returns each key-value pair as a tuple, and Python's unpacking syntax lets you grab both in one clean step. You'll use this pattern constantly when working with JSON data, configuration dictionaries, and structured records of any kind.
Understanding range()
The range() function is one of Python's most useful built-ins. It generates a sequence of numbers, and it's the bridge between traditional index-based loops and Pythonic iteration.
Basic range() Usage
The most common use is simple counting, range() is often the first thing you reach for when you need to do something a fixed number of times:
for i in range(5):
print(i)Expected output:
0
1
2
3
4
range(5) generates numbers from 0 up to (but not including) 5. This follows Python's zero-indexing convention.
range() with Start and Stop
When you need to count from a specific starting point rather than zero, use the two-argument form. This is particularly useful when working with data that has natural non-zero indexing, like chapter numbers or years:
for i in range(2, 7):
print(i)Expected output:
2
3
4
5
6
range(start, stop) begins at start and counts up to (but not including) stop. The "exclusive stop" behavior is consistent with Python's slice notation and makes it easy to reason about how many iterations you'll get, range(2, 7) gives you exactly 7 - 2 = 5 values.
range() with Step
Add a third argument for the step size:
for i in range(0, 10, 2):
print(i)Expected output:
0
2
4
6
8
range(0, 10, 2) counts by twos. This is useful for processing every nth element or iterating with a stride. In machine learning, you'll encounter step-based iteration when you're processing data in mini-batches or downsampling a time series.
Negative Step for Counting Down
for i in range(5, 0, -1):
print(i)Expected output:
5
4
3
2
1
Negative steps count backward. Helpful when you need descending iteration. One common use case is a countdown timer, or reversing an operation order, though for simply reversing a list, reversed() is usually the cleaner choice.
Why range() Matters
range() lets you loop by index when you need position information. It's memory-efficient, it doesn't create a full list in memory; it generates numbers on-the-fly. For a million iterations, range(1000000) is cheap. A list of a million elements, not so much. This lazy evaluation is a core Python design philosophy: defer computation until you actually need the result. You'll encounter this idea repeatedly in generators and itertools, which we'll cover shortly.
enumerate(): Pairing Indexes with Values
Often you need both the index and the value. You could use range(len()), but that's clunky. Beyond being verbose, it also obscures your intent, when someone reads range(len(fruits)), they have to mentally decode what you're trying to do:
fruits = ["apple", "banana", "cherry"]
# Clunky way
for i in range(len(fruits)):
print(f"{i}: {fruits[i]}")Expected output:
0: apple
1: banana
2: cherry
Instead, use enumerate():
fruits = ["apple", "banana", "cherry"]
for index, fruit in enumerate(fruits):
print(f"{index}: {fruit}")Expected output:
0: apple
1: banana
2: cherry
Much cleaner. enumerate() pairs each element with its index. The syntax is natural: unpack the tuple into index and fruit. The intent is immediately clear to anyone reading your code, you want both the position and the value, and enumerate() says that directly.
You can also start at a different index:
fruits = ["apple", "banana", "cherry"]
for index, fruit in enumerate(fruits, start=1):
print(f"{index}: {fruit}")Expected output:
1: apple
2: banana
3: cherry
The start=1 argument tells enumerate() to begin counting from 1 instead of 0. Useful for human-readable numbering. You'll reach for this constantly when generating output that users will read, nobody wants to see "Item 0" in a numbered list. This also comes up when building progress indicators or labeling model outputs in a report.
zip(): Parallel Iteration
When you have multiple sequences and need to process them in parallel, zip() is your solution. It combines iterables element-wise. Think of a zipper on a jacket, it brings together two separate sides, tooth by tooth, in lockstep:
names = ["Alice", "Bob", "Charlie"]
ages = [30, 25, 35]
for name, age in zip(names, ages):
print(f"{name} is {age} years old")Expected output:
Alice is 30 years old
Bob is 25 years old
Charlie is 35 years old
zip() pairs corresponding elements from names and ages. If sequences are different lengths, zip() stops at the shortest:
names = ["Alice", "Bob", "Charlie"]
ages = [30, 25] # Shorter list
for name, age in zip(names, ages):
print(f"{name}: {age}")Expected output:
Alice: 30
Bob: 25
Charlie is left out because there's no matching age. If you want to keep all elements, use itertools.zip_longest():
from itertools import zip_longest
names = ["Alice", "Bob", "Charlie"]
ages = [30, 25]
for name, age in zip_longest(names, ages, fillvalue="N/A"):
print(f"{name}: {age}")Expected output:
Alice: 30
Bob: 25
Charlie: N/A
The fillvalue argument provides a default for missing elements. Handy for processing incomplete data. In data science work, mismatched sequence lengths are a common source of bugs, knowing whether you want to silently drop unmatched records or fill them with a sentinel value is an important design decision, and zip_longest() makes the fill-value approach explicit.
Zipping Multiple Sequences
You're not limited to two:
names = ["Alice", "Bob", "Charlie"]
ages = [30, 25, 35]
cities = ["NYC", "LA", "Chicago"]
for name, age, city in zip(names, ages, cities):
print(f"{name}, {age}, from {city}")Expected output:
Alice, 30, from NYC
Bob, 25, from LA
Charlie, 35, from Chicago
This three-way zip is equivalent to iterating over rows in a table where each column is a separate list. It's a pattern you'll use frequently when you have parallel arrays representing different attributes of the same set of objects, a common structure in scientific computing before you reach for pandas DataFrames.
The while Loop: Condition-Based Iteration
While for loops iterate over sequences, while loops run as long as a condition is true. They're useful when you don't know in advance how many iterations you'll need. The key mental model is: for loops are controlled by a collection, while while loops are controlled by a condition that changes over time:
count = 0
while count < 5:
print(count)
count += 1Expected output:
0
1
2
3
4
The loop keeps running until count < 5 becomes false. This is straightforward but verbose compared to for. Notice that we have to manage the counter variable ourselves, initializing it before the loop, incrementing it inside the loop. This manual state management is exactly the kind of bookkeeping that for loops spare you from when you're iterating over a known collection.
A Practical while Example: User Input
password = ""
attempts = 0
while password != "secret" and attempts < 3:
password = input("Enter password: ")
attempts += 1
if password == "secret":
print("Access granted!")
else:
print("Too many attempts. Access denied.")Expected output (simulating input "wrong", "wrong", "secret"):
Access granted!
This uses a while loop because we don't know in advance how many attempts the user will make (up to the limit). This is the canonical use case for while, event-driven logic where external input determines when the loop ends. You'll see similar patterns in network clients waiting for responses, game loops running until a player quits, and training loops that run until a model converges or a timeout is reached.
break: Exit the Loop Early
The break statement exits a loop immediately, regardless of the condition. It's a way of saying "I found what I was looking for, no need to continue":
for i in range(10):
if i == 5:
break
print(i)Expected output:
0
1
2
3
4
When i reaches 5, break terminates the loop. This is especially valuable in search scenarios where continuing after finding a match would be wasteful. If you're scanning a million records for one specific entry, you don't want to keep scanning after you've found it.
continue: Skip to the Next Iteration
The continue statement skips the rest of the current iteration and jumps to the next. Think of it as a filter that says "this item doesn't qualify, move on":
for i in range(5):
if i == 2:
continue
print(i)Expected output:
0
1
3
4
The number 2 doesn't print because continue skips the print() when i == 2. A common use of continue is skipping invalid or irrelevant data in a larger processing loop, for example, skipping empty lines when reading a file, or skipping records that fail validation in a data pipeline.
The else Clause on Loops
Python allows an else clause on loops. It executes if the loop completes normally (without a break). This is one of Python's lesser-known features, but it's surprisingly useful for search patterns:
for i in range(5):
print(i)
else:
print("Loop completed successfully!")Expected output:
0
1
2
3
4
Loop completed successfully!
If we break, the else doesn't run:
for i in range(5):
if i == 3:
break
print(i)
else:
print("Loop completed successfully!")Expected output:
0
1
2
The else doesn't execute because we broke out of the loop. This is useful for detecting whether a search loop found what it was looking for:
items = [10, 20, 30, 40, 50]
target = 35
for item in items:
if item == target:
print(f"Found {target}")
break
else:
print(f"{target} not found in the list")Expected output:
35 not found in the list
The loop-else pattern eliminates the need for a flag variable. Without it, you'd need to set found = False before the loop and check it after, the else clause makes the "not found" case structurally explicit rather than relying on a boolean variable.
Nested Loops
You can nest loops inside loops:
for i in range(3):
for j in range(3):
print(f"({i}, {j})", end=" ")
print() # New line after inner loop completesExpected output:
(0, 0) (0, 1) (0, 2)
(1, 0) (1, 1) (1, 2)
(2, 0) (2, 1) (2, 2)
This creates a 3x3 grid. The outer loop runs 3 times, and for each iteration, the inner loop runs 3 times. Nested loops have a runtime cost, an outer loop of n iterations with an inner loop of m iterations gives you n × m total iterations. For large datasets, this compounds quickly.
Real-World Example: Matrix Operations
Matrix iteration is the bread and butter of linear algebra in machine learning. Every time you process a 2D image, a weight matrix in a neural network, or a correlation table, you're working with structures like this:
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
for row in matrix:
for element in row:
print(element, end=" ")
print()Expected output:
1 2 3
4 5 6
7 8 9
This prints each element of a 2D list. Notice how we iterate over rows, then over elements within each row. In practice, once you move into NumPy, you'll often replace explicit nested loops with vectorized operations that are orders of magnitude faster, but understanding the nested loop structure first gives you the conceptual foundation for what those vectorized operations are doing underneath.
List Comprehensions: The Loop Alternative
List comprehensions provide a concise way to create lists by transforming or filtering existing ones. They're often faster than loops and more readable:
# Traditional loop
squares = []
for i in range(5):
squares.append(i ** 2)
print(squares)
# List comprehension
squares = [i ** 2 for i in range(5)]
print(squares)Expected output:
[0, 1, 4, 9, 16]
[0, 1, 4, 9, 16]
Both produce the same result, but the comprehension is cleaner. The syntax is: [expression for item in iterable]. The speed advantage comes from Python's internal optimization of comprehensions, they use a dedicated bytecode instruction rather than calling append() on each iteration, which avoids the overhead of method lookups inside the loop.
With a filter:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
evens = [n for n in numbers if n % 2 == 0]
print(evens)Expected output:
[2, 4, 6, 8, 10]
The if clause filters out odd numbers. You can also transform and filter:
numbers = [1, 2, 3, 4, 5]
doubled_evens = [n * 2 for n in numbers if n % 2 == 0]
print(doubled_evens)Expected output:
[4, 8]
When you have both a filter and a transformation in a comprehension, the order reads naturally left to right: "give me n * 2 for each n in numbers where n is even." That clarity of expression is why comprehensions are preferred, they communicate intent in a single line.
Dict and Set Comprehensions
The same pattern works for dictionaries and sets:
# Dict comprehension
squares_dict = {i: i ** 2 for i in range(5)}
print(squares_dict)
# Set comprehension
unique_lengths = {len(word) for word in ["apple", "pie", "python", "code"]}
print(unique_lengths)Expected output:
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
{2, 3, 4, 6}
Comprehensions are one of Python's most elegant features. Use them when you're creating a new collection based on an existing one. Dict comprehensions are particularly powerful when you need to build lookup tables, transforming a list of objects into a dictionary keyed by some attribute, for fast O(1) access later.
Understanding Loop Scope and Variables
One thing that trips up beginners is what happens to loop variables after the loop ends. Unlike some languages, Python loop variables persist after the loop completes:
for i in range(5):
print(i)
print(f"After loop, i = {i}")Expected output:
0
1
2
3
4
After loop, i = 4
The variable i still exists with the value of the last iteration. This can be useful (you know what value caused the loop to exit), but it can also be surprising. If you don't want the variable lingering, you can either reuse it or understand that it's part of your namespace.
This is different in loop comprehensions, which create their own scope:
numbers = [n for n in range(5)]
print(numbers)
# n from the comprehension is not accessible here
try:
print(n)
except NameError:
print("n is not defined outside the comprehension")Expected output:
[0, 1, 2, 3, 4]
n is not defined outside the comprehension
List comprehensions create a local scope for their loop variable, which can prevent accidental variable pollution. This scoping difference between regular loops and comprehensions is worth remembering, it's one of the subtle ways Python's syntax shapes how you think about variable lifetimes and code organization.
Looping with Unpacking
Python's tuple unpacking works wonderfully in loops. You've already seen examples with zip() and dictionaries, but let's explore this more deeply.
Unpacking Tuples and Lists
pairs = [(1, "a"), (2, "b"), (3, "c")]
for number, letter in pairs:
print(f"{number}: {letter}")Expected output:
1: a
2: b
3: c
Each tuple is unpacked into number and letter. You can unpack any depth:
coordinates = [(0, 0, "origin"), (1, 1, "diagonal"), (1, 2, "off-diagonal")]
for x, y, description in coordinates:
print(f"({x}, {y}): {description}")Expected output:
(0, 0): origin
(1, 1): diagonal
(1, 2): off-diagonal
If you don't need all values, use an underscore:
pairs = [(1, "a"), (2, "b"), (3, "c")]
for number, _ in pairs:
print(number)Expected output:
1
2
3
The underscore is a convention for "I don't care about this value." It makes your intent clear to readers. Using _ to discard unwanted values is a widely recognized Python idiom, any experienced Python developer who sees it knows immediately that you've intentionally ignored that part of the tuple, rather than accidentally forgotten to use it.
Extended Unpacking with *
For sequences of varying lengths, use the * operator:
data = [1, 2, 3, 4, 5]
first, *rest = data
print(f"First: {first}")
print(f"Rest: {rest}")Expected output:
First: 1
Rest: [2, 3, 4, 5]
Or grab the last item:
data = [1, 2, 3, 4, 5]
*initial, last = data
print(f"Initial: {initial}")
print(f"Last: {last}")Expected output:
Initial: [1, 2, 3, 4]
Last: 5
In loops, this is less common but useful when parsing structured data:
transactions = [
["Alice", 100, "deposit"],
["Bob", 50, "withdrawal"],
["Charlie", 200, "deposit"]
]
for name, *details in transactions:
print(f"{name}: {details}")Expected output:
Alice: [100, 'deposit']
Bob: [50, 'withdrawal']
Charlie: [200, 'deposit']
This extended unpacking pattern is particularly handy when you're processing log files or CSV rows where the first column is always an identifier but the remaining columns vary. It lets you capture the variable-length tail without knowing its size in advance.
Loop Internals and Iterator Protocol
Understanding what actually happens when Python executes a for loop gives you a much stronger mental model, and explains why so many Python features work the way they do.
When you write for item in collection:, Python doesn't just index into the collection. Instead, it calls iter(collection) to get an iterator object, then repeatedly calls next() on that iterator until a StopIteration exception is raised. You can see this yourself:
fruits = ["apple", "banana", "cherry"]
iterator = iter(fruits)
print(next(iterator)) # apple
print(next(iterator)) # banana
print(next(iterator)) # cherry
# next(iterator) here would raise StopIterationThis is the iterator protocol: any object that implements __iter__() (to return an iterator) and __next__() (to return the next value) can be used in a for loop. That's why loops work on strings, lists, tuples, dictionaries, files, generators, and even custom objects you define yourself, as long as they implement these two methods, Python's for loop machinery knows how to consume them.
This design has a profound implication: you can create lazy iterators that produce values on demand without materializing a full collection. When you call range(1000000), Python doesn't create a list of a million integers. It creates a range object that knows its start, stop, and step, and produces each integer only when next() is called. The same is true for zip(), enumerate(), map(), filter(), and generators, they're all iterators that produce values lazily. This is why Python can iterate over a ten-gigabyte file without running out of memory: it reads one line at a time, and the file object's iterator produces each line only when requested. Every time you use a for loop in Python, the iterator protocol is what's doing the work underneath.
Advanced: itertools Patterns
The itertools module provides advanced iteration tools. Here are three you should know:
itertools.product()
Creates the Cartesian product of iterables (all combinations). This is conceptually equivalent to nested for loops, but expressed in a single clean call:
from itertools import product
sizes = ["S", "M", "L"]
colors = ["red", "blue"]
for size, color in product(sizes, colors):
print(f"{color} {size}")Expected output:
red S
red M
red L
blue S
blue M
blue L
This generates all 6 combinations without nested loops. The beauty here is that product() is often faster and cleaner than nested loops.
For three sequences, the combinatorial explosion grows quickly:
from itertools import product
sizes = ["S", "M", "L"]
colors = ["red", "blue", "green"]
materials = ["cotton", "polyester"]
for size, color, material in product(sizes, colors, materials):
print(f"{color} {size} {material}", end=" | ")Expected output:
red S cotton | red S polyester | red M cotton | red M polyester | red L cotton | red L polyester | blue S cotton | blue S polyester | blue M cotton | blue M polyester | blue L cotton | blue L polyester | green S cotton | green S polyester | green M cotton | green M polyester | green L cotton | green L polyester |
That's 3 × 3 × 2 = 18 combinations generated automatically. The product() function shines in hyperparameter search for machine learning, when you want to try every combination of learning rates, batch sizes, and layer counts, product() generates the grid systematically without requiring you to write three levels of nested loops.
itertools.chain()
Chains multiple iterables into a single sequence:
from itertools import chain
list1 = [1, 2, 3]
list2 = [4, 5, 6]
list3 = [7, 8, 9]
for item in chain(list1, list2, list3):
print(item, end=" ")Expected output:
1 2 3 4 5 6 7 8 9
More efficient than concatenating lists with +. For large sequences, chain() doesn't create intermediate lists; it streams through each one in turn.
Why does this matter? Consider:
# Inefficient: creates a new list
all_items = list1 + list2 + list3
for item in all_items:
process(item)
# Efficient: no intermediate list
for item in chain(list1, list2, list3):
process(item)If list1, list2, list3 are huge, the second version saves memory. A practical scenario: you have data split across multiple files, each loaded as a list. Rather than concatenating them into one giant list, chain() lets you process them sequentially without the memory overhead of the combined structure.
itertools.islice()
Takes a slice of an iterable:
from itertools import islice
numbers = range(100)
for i in islice(numbers, 5, 15):
print(i, end=" ")Expected output:
5 6 7 8 9 10 11 12 13 14
islice(iterable, start, stop) works like list slicing but on any iterable without materializing the whole sequence. Incredibly useful for pagination or sampling:
from itertools import islice
def paginate(iterable, page_size=10):
it = iter(iterable)
while True:
page = list(islice(it, page_size))
if not page:
break
yield page
for page_num, page in enumerate(paginate(range(25), page_size=7), 1):
print(f"Page {page_num}: {page}")Expected output:
Page 1: [0, 1, 2, 3, 4, 5, 6]
Page 2: [7, 8, 9, 10, 11, 12, 13]
Page 3: [14, 15, 16, 17, 18, 19, 20]
Page 4: [21, 22, 23, 24]
This is a generator pattern, it yields pages on demand without loading the entire dataset. In machine learning, this exact pattern is the conceptual foundation of mini-batch gradient descent: instead of training on all data at once, you slice your dataset into pages (batches) and process each one in turn.
Other Useful itertools: count() and repeat()
itertools.count() generates infinite numbers:
from itertools import count, islice
# Generate first 5 numbers starting from 10, counting by 2
for num in islice(count(10, 2), 5):
print(num, end=" ")Expected output:
10 12 14 16 18
itertools.repeat() repeats a value indefinitely (or n times):
from itertools import repeat, islice
# Repeat "hello" 3 times
for word in islice(repeat("hello"), 3):
print(word)Expected output:
hello
hello
hello
Useful for pairing with zip():
from itertools import repeat
values = [1, 2, 3]
separator = repeat("|")
for val, sep in zip(values, separator):
print(f"{val}{sep}", end=" ")Expected output:
1| 2| 3|
for vs while: A Performance Comparison
Here's a key insight: for loops are faster than while loops for iterating over collections. Let's see why:
# for loop
numbers = list(range(1000000))
total = 0
for num in numbers:
total += num
# while loop with manual indexing
numbers = list(range(1000000))
total = 0
i = 0
while i < len(numbers):
total += numbers[i]
i += 1The for loop is cleaner and faster because:
- It directly iterates over the collection
- No manual indexing overhead
- Python's interpreter optimizes it
The while loop requires indexing (numbers[i]) on each iteration, which has a small cost per iteration. Over a million iterations, it adds up.
Rule of thumb: Use for for collections, while for condition-based logic. If you're tempted to write a while loop with an index and length check, use for instead.
Generator Expressions: Lazy Evaluation
Generator expressions are like list comprehensions, but they generate values on-the-fly instead of creating the entire list in memory. They use parentheses instead of square brackets:
# List comprehension: creates entire list
squares_list = [i ** 2 for i in range(1000000)]
# Generator expression: generates on demand
squares_gen = (i ** 2 for i in range(1000000))
print(type(squares_list))
print(type(squares_gen))Expected output:
<class 'list'>
<class 'generator'>
With a million elements, the list comprehension consumes significant memory. The generator uses almost nothing until you iterate:
# This doesn't compute anything yet
squares_gen = (i ** 2 for i in range(10))
# Now we iterate and compute
for square in squares_gen:
print(square, end=" ")Expected output:
0 1 4 9 16 25 36 49 64 81
Generators are perfect for processing large files line-by-line or streaming data:
def read_large_file(filepath):
with open(filepath) as f:
for line in f:
yield line.strip()
# Process without loading entire file into memory
for line in read_large_file("huge_file.txt"):
process(line)Generator expressions work with functions expecting iterables:
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Sum only even numbers without creating intermediate list
total = sum(n for n in numbers if n % 2 == 0)
print(f"Sum of evens: {total}")Expected output:
Sum of evens: 30
Notice the parentheses are optional when the generator is the sole argument to a function. This kind of pipeline, filter, transform, aggregate, all in one expression with no intermediate collections, is a pattern you'll use constantly in data processing code.
Performance Considerations
Beyond the for vs while speed difference, a handful of specific loop patterns have significant performance implications you should internalize before your datasets grow large.
The single biggest win is avoiding repeated membership tests on a list. Checking if x in some_list is O(n), Python has to scan through the entire list every time. If you're calling that check inside a loop, you've just turned a linear operation into a quadratic one. Converting the list to a set first, some_set = set(some_list), makes every subsequent in check O(1). For a thousand iterations over a thousand-element list, that's the difference between a million operations and a thousand.
The second major consideration is list building inside loops. Every time you do result = result + [item], Python allocates a brand-new list and copies all previous elements into it. That's O(n) per append, giving O(n²) overall. Using result.append(item) instead is O(1) amortized. Better still, if you know in advance what you're building, use a list comprehension, Python's bytecode for comprehensions skips the overhead of method dispatch on each iteration. Finally, look up operations in a loop, if you're calling the same function or accessing the same attribute on every iteration, consider caching the reference outside the loop to avoid the lookup cost. For tight inner loops processing millions of elements, these micro-optimizations compound into real gains.
Filtering and Validation Loops
Often you're processing data and need to validate, filter, or categorize items. Here are patterns for these common tasks:
Filtering Valid Items
items = [10, -5, 20, 0, 15, -3, 8]
valid_items = [item for item in items if item > 0]
print(f"Valid items: {valid_items}")Expected output:
Valid items: [10, 20, 15, 8]
Separating Into Categories
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
evens = []
odds = []
for num in numbers:
if num % 2 == 0:
evens.append(num)
else:
odds.append(num)
print(f"Evens: {evens}")
print(f"Odds: {odds}")Expected output:
Evens: [2, 4, 6, 8, 10]
Odds: [1, 3, 5, 7, 9]
Counting with a Default
from collections import defaultdict
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
word_count = defaultdict(int)
for word in words:
word_count[word] += 1
print(dict(word_count))Expected output:
{'apple': 3, 'banana': 2, 'cherry': 1}
Without defaultdict, you'd need to check if word in word_count first. This is cleaner.
Finding Duplicates
items = [1, 2, 3, 2, 4, 3, 5, 1]
seen = set()
duplicates = set()
for item in items:
if item in seen:
duplicates.add(item)
seen.add(item)
print(f"Duplicates: {sorted(duplicates)}")Expected output:
Duplicates: [1, 2, 3]
Sets make this efficient, checking membership in a set is O(1), whereas checking in a list is O(n).
Common Loop Mistakes
Even experienced developers fall into certain loop traps. Knowing them ahead of time will save you hours of debugging.
The most dangerous mistake is modifying a collection while iterating over it. When you remove or add elements to a list mid-loop, the iterator's internal pointer gets out of sync with the list's actual contents. In the best case, you skip elements; in the worst case, you get an IndexError or silent data corruption. Always build a new collection instead of mutating the one you're looping over, a list comprehension with a filter condition is the cleanest solution.
A close second is the unbounded while loop, a loop whose termination condition can never become false. This happens most often when you forget to update the variable the condition depends on, or when you update it in the wrong branch of an if statement. Python has no built-in timeout for loops, so an infinite loop will spin forever, consuming 100% of one CPU core. Always verify that every path through your while loop moves the state toward termination. If there's any doubt, add a hard iteration limit as a safety valve.
The third common mistake is using the wrong loop variable inside a nested loop. When you have loops named i and j, it's easy to accidentally write matrix[i][i] instead of matrix[i][j], especially when the code spans multiple lines. Give your loop variables meaningful names whenever possible, for row_idx in range(rows) is much harder to confuse than for i in range(rows). Finally, watch out for accidentally reusing a loop variable name from an outer loop inside an inner loop, because Python's loop variables persist after the loop ends, the inner loop will clobber the outer loop's variable if they share a name, leading to subtle bugs in the logic that runs after the inner loop.
Real-World Loop Patterns
Processing Pairs of Consecutive Elements
data = [1, 2, 3, 4, 5]
for current, next in zip(data, data[1:]):
print(f"{current} -> {next}")Expected output:
1 -> 2
2 -> 3
3 -> 4
4 -> 5
Filtering and Transforming in One Pass
prices = [10, 25, 5, 30, 15]
discounted = []
for price in prices:
if price > 10:
discounted.append(price * 0.9)
print(discounted)Expected output:
[22.5, 27.0, 13.5]
Or as a comprehension:
prices = [10, 25, 5, 30, 15]
discounted = [p * 0.9 for p in prices if p > 10]
print(discounted)Early Exit with a Sentinel
lines = ["hello", "world", "", "end", "more"]
for line in lines:
if line == "":
break
print(line)Expected output:
hello
world
Accumulation Patterns
words = ["apple", "banana", "cherry", "date"]
word_lengths = {}
for word in words:
word_lengths[word] = len(word)
print(word_lengths)Expected output:
{'apple': 5, 'banana': 6, 'cherry': 6, 'date': 4}
Choosing the Right Loop Construct
You now have many tools: for, while, comprehensions, generators, zip(), enumerate(), and itertools. Here's a decision tree for choosing the right one.
When to Use for
Use a for loop when:
- You're iterating over a sequence (list, tuple, string, dict)
- You know (or don't care) how many iterations you'll make
- You want readable, Pythonic code
# Best for collections
for item in items:
print(item)When to Use while
Use a while loop when:
- The number of iterations is unknown
- You're checking a condition that changes each iteration
- You need to handle user input or external events
# Best for condition-based logic
password = ""
while password != "secret":
password = input("Enter password: ")When to Use List Comprehensions
Use comprehensions when:
- You're creating a new list based on an existing one
- The logic is simple (one or two operations)
- You're filtering or transforming
# Best for creating derived collections
squared = [x ** 2 for x in numbers if x > 0]When to Use Generators
Use generators when:
- Processing large datasets or infinite sequences
- Memory is a concern
- You only need to iterate once
# Best for lazy evaluation
large_squares = (x ** 2 for x in huge_dataset)Summary
Loops are the heartbeat of data processing. Master these patterns and you'll find yourself reaching for the right tool intuitively, without having to consciously weigh the tradeoffs each time.
- for loops are your primary tool. Use them for iterating over sequences.
- range() generates number sequences with precise control over start, stop, and step.
- enumerate() pairs indexes with values, no more manual indexing.
- zip() synchronizes parallel iteration over multiple sequences.
- while loops are for condition-based logic when you don't know iteration count in advance.
- break and continue provide fine-grained control; else on loops adds elegance.
- List comprehensions are faster and cleaner than loops for creating filtered or transformed lists.
- itertools provides advanced patterns like product, chain, and islice.
- For is faster than while for collections, always prefer for-loop iteration when possible.
The iterator protocol is what binds all of these together, understanding that Python's for loop is really just a standardized way of calling __next__() explains why so many things are iterable, and why you can write your own. Performance matters most at scale: set membership over list membership, append() over concatenation, generators over lists for large data. And the common mistakes, modifying a collection while iterating, unbounded while loops, variable name collisions in nested loops, are worth memorizing now so you don't debug them at 2am later.
The best loops are often invisible to the reader. They're so natural that the intent jumps off the page. That comes from practice and choosing the right tool for each job. As you move through this series and start working with NumPy arrays, pandas DataFrames, and machine learning pipelines, the loop intuitions you build here will transfer directly, you'll recognize mini-batch iteration as islice, vectorized operations as comprehensions at the C level, and training loops as while loops with convergence conditions. Loops are not just a beginner topic; they're a fundamental lens through which all of data processing becomes legible.
In the next article, we'll tackle error handling with try/except blocks. Your programs will crash sometimes, and that's okay, we'll learn to handle failures gracefully and recover from unexpected situations.