April 4, 2025
Python Beginner Loops Iteration

Python Loops Mastered: for, while, range, enumerate, and zip

If conditionals are the nervous system of your program, loops are its heartbeat. They let you repeat operations over collections of data, automate tedious tasks, and write code once that works on datasets of any size. Without loops, you'd have to write out every single operation manually, no way to scale, no way to handle real-world data.

Think about what loops actually enable. Imagine you have a list of ten thousand customer records and you need to apply a discount to every one of them. Without a loop, you would literally need ten thousand individual lines of code. With a loop, you write three. That compression of intent is what makes loops so fundamental, they let you express "do this for every item" in a way that mirrors how humans actually think about repetitive work. And in AI and machine learning specifically, loops are everywhere: iterating over training batches, running gradient descent steps, evaluating model predictions, preprocessing datasets. Mastering loops now means you'll be comfortable with the patterns you encounter constantly in NumPy, PyTorch, TensorFlow, and scikit-learn later in this series.

In this article, we're covering Python's looping arsenal: the versatile for loop, the condition-based while loop, the range() function that powers iteration, enumerate() for getting indexes alongside values, and zip() for parallel iteration. We'll explore nested loops, comprehensions as loop alternatives, and advanced patterns with itertools. We'll also dig into the iterator protocol that makes all of this work under the hood, common mistakes that trip up even experienced developers, and performance considerations that matter when your datasets grow from hundreds to millions of rows. By the end, you'll understand not just how to loop, but when to use each tool and why for loops are almost always your first choice over while.

Table of Contents
  1. The for Loop: Your Workhorse
  2. Basic Iteration Over a List
  3. Iterating Over Strings
  4. Iterating Over Dictionaries
  5. Understanding range()
  6. Basic range() Usage
  7. range() with Start and Stop
  8. range() with Step
  9. Negative Step for Counting Down
  10. Why range() Matters
  11. enumerate(): Pairing Indexes with Values
  12. zip(): Parallel Iteration
  13. Zipping Multiple Sequences
  14. The while Loop: Condition-Based Iteration
  15. A Practical while Example: User Input
  16. break: Exit the Loop Early
  17. continue: Skip to the Next Iteration
  18. The else Clause on Loops
  19. Nested Loops
  20. Real-World Example: Matrix Operations
  21. List Comprehensions: The Loop Alternative
  22. Dict and Set Comprehensions
  23. Understanding Loop Scope and Variables
  24. Looping with Unpacking
  25. Unpacking Tuples and Lists
  26. Extended Unpacking with \*
  27. Loop Internals and Iterator Protocol
  28. Advanced: itertools Patterns
  29. itertools.product()
  30. itertools.chain()
  31. itertools.islice()
  32. Other Useful itertools: count() and repeat()
  33. for vs while: A Performance Comparison
  34. Generator Expressions: Lazy Evaluation
  35. Performance Considerations
  36. Filtering and Validation Loops
  37. Filtering Valid Items
  38. Separating Into Categories
  39. Counting with a Default
  40. Finding Duplicates
  41. Common Loop Mistakes
  42. Real-World Loop Patterns
  43. Processing Pairs of Consecutive Elements
  44. Filtering and Transforming in One Pass
  45. Early Exit with a Sentinel
  46. Accumulation Patterns
  47. Choosing the Right Loop Construct
  48. When to Use for
  49. When to Use while
  50. When to Use List Comprehensions
  51. When to Use Generators
  52. Summary

The for Loop: Your Workhorse

The for loop is your primary tool for iterating over sequences. It's clean, readable, and works on anything iterable: lists, strings, tuples, dictionaries, sets, or anything with a __iter__ method.

Basic Iteration Over a List

Here's the simplest form, notice how the loop variable fruit takes on each value from the list one at a time:

python
fruits = ["apple", "banana", "cherry"]
 
for fruit in fruits:
    print(fruit)

Expected output:

apple
banana
cherry

The syntax is straightforward: for item in iterable: followed by an indented block. Python assigns each element to fruit one at a time, executes the block, then moves to the next. What makes this elegant is that you're expressing your intent directly, "for each fruit in fruits, print it", rather than managing indexes and length checks yourself. Simple and elegant.

Iterating Over Strings

Strings are iterables too. When you loop over a string, you get individual characters, this is useful in text processing tasks and forms the basis of many natural language processing operations:

python
word = "Python"
 
for letter in word:
    print(letter)

Expected output:

P
y
t
h
o
n

This is handy for character-by-character processing. Want to count vowels?

python
word = "Python"
vowel_count = 0
 
for letter in word:
    if letter.lower() in "aeiou":
        vowel_count += 1
 
print(f"Vowels in '{word}': {vowel_count}")

Expected output:

Vowels in 'Python': 1

Notice that we used letter.lower() to handle both uppercase and lowercase vowels in a single check. This kind of defensive coding, anticipating variations in the data, is a habit worth building from the start, because real-world text is rarely clean.

Iterating Over Dictionaries

With dictionaries, a bare for loop gives you the keys. This is a common source of confusion for newcomers, because dictionaries hold key-value pairs, but by default Python exposes only the keys during iteration:

python
person = {"name": "Alice", "age": 30, "city": "Portland"}
 
for key in person:
    print(key)

Expected output:

name
age
city

To get both keys and values, use .items():

python
person = {"name": "Alice", "age": 30, "city": "Portland"}
 
for key, value in person.items():
    print(f"{key}: {value}")

Expected output:

name: Alice
age: 30
city: Portland

This is the pythonic way. Using .keys() and then indexing back into the dict is redundant. The .items() method returns each key-value pair as a tuple, and Python's unpacking syntax lets you grab both in one clean step. You'll use this pattern constantly when working with JSON data, configuration dictionaries, and structured records of any kind.

Understanding range()

The range() function is one of Python's most useful built-ins. It generates a sequence of numbers, and it's the bridge between traditional index-based loops and Pythonic iteration.

Basic range() Usage

The most common use is simple counting, range() is often the first thing you reach for when you need to do something a fixed number of times:

python
for i in range(5):
    print(i)

Expected output:

0
1
2
3
4

range(5) generates numbers from 0 up to (but not including) 5. This follows Python's zero-indexing convention.

range() with Start and Stop

When you need to count from a specific starting point rather than zero, use the two-argument form. This is particularly useful when working with data that has natural non-zero indexing, like chapter numbers or years:

python
for i in range(2, 7):
    print(i)

Expected output:

2
3
4
5
6

range(start, stop) begins at start and counts up to (but not including) stop. The "exclusive stop" behavior is consistent with Python's slice notation and makes it easy to reason about how many iterations you'll get, range(2, 7) gives you exactly 7 - 2 = 5 values.

range() with Step

Add a third argument for the step size:

python
for i in range(0, 10, 2):
    print(i)

Expected output:

0
2
4
6
8

range(0, 10, 2) counts by twos. This is useful for processing every nth element or iterating with a stride. In machine learning, you'll encounter step-based iteration when you're processing data in mini-batches or downsampling a time series.

Negative Step for Counting Down

python
for i in range(5, 0, -1):
    print(i)

Expected output:

5
4
3
2
1

Negative steps count backward. Helpful when you need descending iteration. One common use case is a countdown timer, or reversing an operation order, though for simply reversing a list, reversed() is usually the cleaner choice.

Why range() Matters

range() lets you loop by index when you need position information. It's memory-efficient, it doesn't create a full list in memory; it generates numbers on-the-fly. For a million iterations, range(1000000) is cheap. A list of a million elements, not so much. This lazy evaluation is a core Python design philosophy: defer computation until you actually need the result. You'll encounter this idea repeatedly in generators and itertools, which we'll cover shortly.

enumerate(): Pairing Indexes with Values

Often you need both the index and the value. You could use range(len()), but that's clunky. Beyond being verbose, it also obscures your intent, when someone reads range(len(fruits)), they have to mentally decode what you're trying to do:

python
fruits = ["apple", "banana", "cherry"]
 
# Clunky way
for i in range(len(fruits)):
    print(f"{i}: {fruits[i]}")

Expected output:

0: apple
1: banana
2: cherry

Instead, use enumerate():

python
fruits = ["apple", "banana", "cherry"]
 
for index, fruit in enumerate(fruits):
    print(f"{index}: {fruit}")

Expected output:

0: apple
1: banana
2: cherry

Much cleaner. enumerate() pairs each element with its index. The syntax is natural: unpack the tuple into index and fruit. The intent is immediately clear to anyone reading your code, you want both the position and the value, and enumerate() says that directly.

You can also start at a different index:

python
fruits = ["apple", "banana", "cherry"]
 
for index, fruit in enumerate(fruits, start=1):
    print(f"{index}: {fruit}")

Expected output:

1: apple
2: banana
3: cherry

The start=1 argument tells enumerate() to begin counting from 1 instead of 0. Useful for human-readable numbering. You'll reach for this constantly when generating output that users will read, nobody wants to see "Item 0" in a numbered list. This also comes up when building progress indicators or labeling model outputs in a report.

zip(): Parallel Iteration

When you have multiple sequences and need to process them in parallel, zip() is your solution. It combines iterables element-wise. Think of a zipper on a jacket, it brings together two separate sides, tooth by tooth, in lockstep:

python
names = ["Alice", "Bob", "Charlie"]
ages = [30, 25, 35]
 
for name, age in zip(names, ages):
    print(f"{name} is {age} years old")

Expected output:

Alice is 30 years old
Bob is 25 years old
Charlie is 35 years old

zip() pairs corresponding elements from names and ages. If sequences are different lengths, zip() stops at the shortest:

python
names = ["Alice", "Bob", "Charlie"]
ages = [30, 25]  # Shorter list
 
for name, age in zip(names, ages):
    print(f"{name}: {age}")

Expected output:

Alice: 30
Bob: 25

Charlie is left out because there's no matching age. If you want to keep all elements, use itertools.zip_longest():

python
from itertools import zip_longest
 
names = ["Alice", "Bob", "Charlie"]
ages = [30, 25]
 
for name, age in zip_longest(names, ages, fillvalue="N/A"):
    print(f"{name}: {age}")

Expected output:

Alice: 30
Bob: 25
Charlie: N/A

The fillvalue argument provides a default for missing elements. Handy for processing incomplete data. In data science work, mismatched sequence lengths are a common source of bugs, knowing whether you want to silently drop unmatched records or fill them with a sentinel value is an important design decision, and zip_longest() makes the fill-value approach explicit.

Zipping Multiple Sequences

You're not limited to two:

python
names = ["Alice", "Bob", "Charlie"]
ages = [30, 25, 35]
cities = ["NYC", "LA", "Chicago"]
 
for name, age, city in zip(names, ages, cities):
    print(f"{name}, {age}, from {city}")

Expected output:

Alice, 30, from NYC
Bob, 25, from LA
Charlie, 35, from Chicago

This three-way zip is equivalent to iterating over rows in a table where each column is a separate list. It's a pattern you'll use frequently when you have parallel arrays representing different attributes of the same set of objects, a common structure in scientific computing before you reach for pandas DataFrames.

The while Loop: Condition-Based Iteration

While for loops iterate over sequences, while loops run as long as a condition is true. They're useful when you don't know in advance how many iterations you'll need. The key mental model is: for loops are controlled by a collection, while while loops are controlled by a condition that changes over time:

python
count = 0
 
while count < 5:
    print(count)
    count += 1

Expected output:

0
1
2
3
4

The loop keeps running until count < 5 becomes false. This is straightforward but verbose compared to for. Notice that we have to manage the counter variable ourselves, initializing it before the loop, incrementing it inside the loop. This manual state management is exactly the kind of bookkeeping that for loops spare you from when you're iterating over a known collection.

A Practical while Example: User Input

python
password = ""
attempts = 0
 
while password != "secret" and attempts < 3:
    password = input("Enter password: ")
    attempts += 1
 
if password == "secret":
    print("Access granted!")
else:
    print("Too many attempts. Access denied.")

Expected output (simulating input "wrong", "wrong", "secret"):

Access granted!

This uses a while loop because we don't know in advance how many attempts the user will make (up to the limit). This is the canonical use case for while, event-driven logic where external input determines when the loop ends. You'll see similar patterns in network clients waiting for responses, game loops running until a player quits, and training loops that run until a model converges or a timeout is reached.

break: Exit the Loop Early

The break statement exits a loop immediately, regardless of the condition. It's a way of saying "I found what I was looking for, no need to continue":

python
for i in range(10):
    if i == 5:
        break
    print(i)

Expected output:

0
1
2
3
4

When i reaches 5, break terminates the loop. This is especially valuable in search scenarios where continuing after finding a match would be wasteful. If you're scanning a million records for one specific entry, you don't want to keep scanning after you've found it.

continue: Skip to the Next Iteration

The continue statement skips the rest of the current iteration and jumps to the next. Think of it as a filter that says "this item doesn't qualify, move on":

python
for i in range(5):
    if i == 2:
        continue
    print(i)

Expected output:

0
1
3
4

The number 2 doesn't print because continue skips the print() when i == 2. A common use of continue is skipping invalid or irrelevant data in a larger processing loop, for example, skipping empty lines when reading a file, or skipping records that fail validation in a data pipeline.

The else Clause on Loops

Python allows an else clause on loops. It executes if the loop completes normally (without a break). This is one of Python's lesser-known features, but it's surprisingly useful for search patterns:

python
for i in range(5):
    print(i)
else:
    print("Loop completed successfully!")

Expected output:

0
1
2
3
4
Loop completed successfully!

If we break, the else doesn't run:

python
for i in range(5):
    if i == 3:
        break
    print(i)
else:
    print("Loop completed successfully!")

Expected output:

0
1
2

The else doesn't execute because we broke out of the loop. This is useful for detecting whether a search loop found what it was looking for:

python
items = [10, 20, 30, 40, 50]
target = 35
 
for item in items:
    if item == target:
        print(f"Found {target}")
        break
else:
    print(f"{target} not found in the list")

Expected output:

35 not found in the list

The loop-else pattern eliminates the need for a flag variable. Without it, you'd need to set found = False before the loop and check it after, the else clause makes the "not found" case structurally explicit rather than relying on a boolean variable.

Nested Loops

You can nest loops inside loops:

python
for i in range(3):
    for j in range(3):
        print(f"({i}, {j})", end=" ")
    print()  # New line after inner loop completes

Expected output:

(0, 0) (0, 1) (0, 2)
(1, 0) (1, 1) (1, 2)
(2, 0) (2, 1) (2, 2)

This creates a 3x3 grid. The outer loop runs 3 times, and for each iteration, the inner loop runs 3 times. Nested loops have a runtime cost, an outer loop of n iterations with an inner loop of m iterations gives you n × m total iterations. For large datasets, this compounds quickly.

Real-World Example: Matrix Operations

Matrix iteration is the bread and butter of linear algebra in machine learning. Every time you process a 2D image, a weight matrix in a neural network, or a correlation table, you're working with structures like this:

python
matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]
 
for row in matrix:
    for element in row:
        print(element, end=" ")
    print()

Expected output:

1 2 3
4 5 6
7 8 9

This prints each element of a 2D list. Notice how we iterate over rows, then over elements within each row. In practice, once you move into NumPy, you'll often replace explicit nested loops with vectorized operations that are orders of magnitude faster, but understanding the nested loop structure first gives you the conceptual foundation for what those vectorized operations are doing underneath.

List Comprehensions: The Loop Alternative

List comprehensions provide a concise way to create lists by transforming or filtering existing ones. They're often faster than loops and more readable:

python
# Traditional loop
squares = []
for i in range(5):
    squares.append(i ** 2)
print(squares)
 
# List comprehension
squares = [i ** 2 for i in range(5)]
print(squares)

Expected output:

[0, 1, 4, 9, 16]
[0, 1, 4, 9, 16]

Both produce the same result, but the comprehension is cleaner. The syntax is: [expression for item in iterable]. The speed advantage comes from Python's internal optimization of comprehensions, they use a dedicated bytecode instruction rather than calling append() on each iteration, which avoids the overhead of method lookups inside the loop.

With a filter:

python
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
evens = [n for n in numbers if n % 2 == 0]
print(evens)

Expected output:

[2, 4, 6, 8, 10]

The if clause filters out odd numbers. You can also transform and filter:

python
numbers = [1, 2, 3, 4, 5]
doubled_evens = [n * 2 for n in numbers if n % 2 == 0]
print(doubled_evens)

Expected output:

[4, 8]

When you have both a filter and a transformation in a comprehension, the order reads naturally left to right: "give me n * 2 for each n in numbers where n is even." That clarity of expression is why comprehensions are preferred, they communicate intent in a single line.

Dict and Set Comprehensions

The same pattern works for dictionaries and sets:

python
# Dict comprehension
squares_dict = {i: i ** 2 for i in range(5)}
print(squares_dict)
 
# Set comprehension
unique_lengths = {len(word) for word in ["apple", "pie", "python", "code"]}
print(unique_lengths)

Expected output:

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
{2, 3, 4, 6}

Comprehensions are one of Python's most elegant features. Use them when you're creating a new collection based on an existing one. Dict comprehensions are particularly powerful when you need to build lookup tables, transforming a list of objects into a dictionary keyed by some attribute, for fast O(1) access later.

Understanding Loop Scope and Variables

One thing that trips up beginners is what happens to loop variables after the loop ends. Unlike some languages, Python loop variables persist after the loop completes:

python
for i in range(5):
    print(i)
 
print(f"After loop, i = {i}")

Expected output:

0
1
2
3
4
After loop, i = 4

The variable i still exists with the value of the last iteration. This can be useful (you know what value caused the loop to exit), but it can also be surprising. If you don't want the variable lingering, you can either reuse it or understand that it's part of your namespace.

This is different in loop comprehensions, which create their own scope:

python
numbers = [n for n in range(5)]
print(numbers)
 
# n from the comprehension is not accessible here
try:
    print(n)
except NameError:
    print("n is not defined outside the comprehension")

Expected output:

[0, 1, 2, 3, 4]
n is not defined outside the comprehension

List comprehensions create a local scope for their loop variable, which can prevent accidental variable pollution. This scoping difference between regular loops and comprehensions is worth remembering, it's one of the subtle ways Python's syntax shapes how you think about variable lifetimes and code organization.

Looping with Unpacking

Python's tuple unpacking works wonderfully in loops. You've already seen examples with zip() and dictionaries, but let's explore this more deeply.

Unpacking Tuples and Lists

python
pairs = [(1, "a"), (2, "b"), (3, "c")]
 
for number, letter in pairs:
    print(f"{number}: {letter}")

Expected output:

1: a
2: b
3: c

Each tuple is unpacked into number and letter. You can unpack any depth:

python
coordinates = [(0, 0, "origin"), (1, 1, "diagonal"), (1, 2, "off-diagonal")]
 
for x, y, description in coordinates:
    print(f"({x}, {y}): {description}")

Expected output:

(0, 0): origin
(1, 1): diagonal
(1, 2): off-diagonal

If you don't need all values, use an underscore:

python
pairs = [(1, "a"), (2, "b"), (3, "c")]
 
for number, _ in pairs:
    print(number)

Expected output:

1
2
3

The underscore is a convention for "I don't care about this value." It makes your intent clear to readers. Using _ to discard unwanted values is a widely recognized Python idiom, any experienced Python developer who sees it knows immediately that you've intentionally ignored that part of the tuple, rather than accidentally forgotten to use it.

Extended Unpacking with *

For sequences of varying lengths, use the * operator:

python
data = [1, 2, 3, 4, 5]
 
first, *rest = data
print(f"First: {first}")
print(f"Rest: {rest}")

Expected output:

First: 1
Rest: [2, 3, 4, 5]

Or grab the last item:

python
data = [1, 2, 3, 4, 5]
 
*initial, last = data
print(f"Initial: {initial}")
print(f"Last: {last}")

Expected output:

Initial: [1, 2, 3, 4]
Last: 5

In loops, this is less common but useful when parsing structured data:

python
transactions = [
    ["Alice", 100, "deposit"],
    ["Bob", 50, "withdrawal"],
    ["Charlie", 200, "deposit"]
]
 
for name, *details in transactions:
    print(f"{name}: {details}")

Expected output:

Alice: [100, 'deposit']
Bob: [50, 'withdrawal']
Charlie: [200, 'deposit']

This extended unpacking pattern is particularly handy when you're processing log files or CSV rows where the first column is always an identifier but the remaining columns vary. It lets you capture the variable-length tail without knowing its size in advance.

Loop Internals and Iterator Protocol

Understanding what actually happens when Python executes a for loop gives you a much stronger mental model, and explains why so many Python features work the way they do.

When you write for item in collection:, Python doesn't just index into the collection. Instead, it calls iter(collection) to get an iterator object, then repeatedly calls next() on that iterator until a StopIteration exception is raised. You can see this yourself:

python
fruits = ["apple", "banana", "cherry"]
 
iterator = iter(fruits)
print(next(iterator))  # apple
print(next(iterator))  # banana
print(next(iterator))  # cherry
# next(iterator) here would raise StopIteration

This is the iterator protocol: any object that implements __iter__() (to return an iterator) and __next__() (to return the next value) can be used in a for loop. That's why loops work on strings, lists, tuples, dictionaries, files, generators, and even custom objects you define yourself, as long as they implement these two methods, Python's for loop machinery knows how to consume them.

This design has a profound implication: you can create lazy iterators that produce values on demand without materializing a full collection. When you call range(1000000), Python doesn't create a list of a million integers. It creates a range object that knows its start, stop, and step, and produces each integer only when next() is called. The same is true for zip(), enumerate(), map(), filter(), and generators, they're all iterators that produce values lazily. This is why Python can iterate over a ten-gigabyte file without running out of memory: it reads one line at a time, and the file object's iterator produces each line only when requested. Every time you use a for loop in Python, the iterator protocol is what's doing the work underneath.

Advanced: itertools Patterns

The itertools module provides advanced iteration tools. Here are three you should know:

itertools.product()

Creates the Cartesian product of iterables (all combinations). This is conceptually equivalent to nested for loops, but expressed in a single clean call:

python
from itertools import product
 
sizes = ["S", "M", "L"]
colors = ["red", "blue"]
 
for size, color in product(sizes, colors):
    print(f"{color} {size}")

Expected output:

red S
red M
red L
blue S
blue M
blue L

This generates all 6 combinations without nested loops. The beauty here is that product() is often faster and cleaner than nested loops.

For three sequences, the combinatorial explosion grows quickly:

python
from itertools import product
 
sizes = ["S", "M", "L"]
colors = ["red", "blue", "green"]
materials = ["cotton", "polyester"]
 
for size, color, material in product(sizes, colors, materials):
    print(f"{color} {size} {material}", end=" | ")

Expected output:

red S cotton | red S polyester | red M cotton | red M polyester | red L cotton | red L polyester | blue S cotton | blue S polyester | blue M cotton | blue M polyester | blue L cotton | blue L polyester | green S cotton | green S polyester | green M cotton | green M polyester | green L cotton | green L polyester |

That's 3 × 3 × 2 = 18 combinations generated automatically. The product() function shines in hyperparameter search for machine learning, when you want to try every combination of learning rates, batch sizes, and layer counts, product() generates the grid systematically without requiring you to write three levels of nested loops.

itertools.chain()

Chains multiple iterables into a single sequence:

python
from itertools import chain
 
list1 = [1, 2, 3]
list2 = [4, 5, 6]
list3 = [7, 8, 9]
 
for item in chain(list1, list2, list3):
    print(item, end=" ")

Expected output:

1 2 3 4 5 6 7 8 9

More efficient than concatenating lists with +. For large sequences, chain() doesn't create intermediate lists; it streams through each one in turn.

Why does this matter? Consider:

python
# Inefficient: creates a new list
all_items = list1 + list2 + list3
for item in all_items:
    process(item)
 
# Efficient: no intermediate list
for item in chain(list1, list2, list3):
    process(item)

If list1, list2, list3 are huge, the second version saves memory. A practical scenario: you have data split across multiple files, each loaded as a list. Rather than concatenating them into one giant list, chain() lets you process them sequentially without the memory overhead of the combined structure.

itertools.islice()

Takes a slice of an iterable:

python
from itertools import islice
 
numbers = range(100)
 
for i in islice(numbers, 5, 15):
    print(i, end=" ")

Expected output:

5 6 7 8 9 10 11 12 13 14

islice(iterable, start, stop) works like list slicing but on any iterable without materializing the whole sequence. Incredibly useful for pagination or sampling:

python
from itertools import islice
 
def paginate(iterable, page_size=10):
    it = iter(iterable)
    while True:
        page = list(islice(it, page_size))
        if not page:
            break
        yield page
 
for page_num, page in enumerate(paginate(range(25), page_size=7), 1):
    print(f"Page {page_num}: {page}")

Expected output:

Page 1: [0, 1, 2, 3, 4, 5, 6]
Page 2: [7, 8, 9, 10, 11, 12, 13]
Page 3: [14, 15, 16, 17, 18, 19, 20]
Page 4: [21, 22, 23, 24]

This is a generator pattern, it yields pages on demand without loading the entire dataset. In machine learning, this exact pattern is the conceptual foundation of mini-batch gradient descent: instead of training on all data at once, you slice your dataset into pages (batches) and process each one in turn.

Other Useful itertools: count() and repeat()

itertools.count() generates infinite numbers:

python
from itertools import count, islice
 
# Generate first 5 numbers starting from 10, counting by 2
for num in islice(count(10, 2), 5):
    print(num, end=" ")

Expected output:

10 12 14 16 18

itertools.repeat() repeats a value indefinitely (or n times):

python
from itertools import repeat, islice
 
# Repeat "hello" 3 times
for word in islice(repeat("hello"), 3):
    print(word)

Expected output:

hello
hello
hello

Useful for pairing with zip():

python
from itertools import repeat
 
values = [1, 2, 3]
separator = repeat("|")
 
for val, sep in zip(values, separator):
    print(f"{val}{sep}", end=" ")

Expected output:

1| 2| 3|

for vs while: A Performance Comparison

Here's a key insight: for loops are faster than while loops for iterating over collections. Let's see why:

python
# for loop
numbers = list(range(1000000))
total = 0
 
for num in numbers:
    total += num
 
# while loop with manual indexing
numbers = list(range(1000000))
total = 0
i = 0
 
while i < len(numbers):
    total += numbers[i]
    i += 1

The for loop is cleaner and faster because:

  1. It directly iterates over the collection
  2. No manual indexing overhead
  3. Python's interpreter optimizes it

The while loop requires indexing (numbers[i]) on each iteration, which has a small cost per iteration. Over a million iterations, it adds up.

Rule of thumb: Use for for collections, while for condition-based logic. If you're tempted to write a while loop with an index and length check, use for instead.

Generator Expressions: Lazy Evaluation

Generator expressions are like list comprehensions, but they generate values on-the-fly instead of creating the entire list in memory. They use parentheses instead of square brackets:

python
# List comprehension: creates entire list
squares_list = [i ** 2 for i in range(1000000)]
 
# Generator expression: generates on demand
squares_gen = (i ** 2 for i in range(1000000))
 
print(type(squares_list))
print(type(squares_gen))

Expected output:

<class 'list'>
<class 'generator'>

With a million elements, the list comprehension consumes significant memory. The generator uses almost nothing until you iterate:

python
# This doesn't compute anything yet
squares_gen = (i ** 2 for i in range(10))
 
# Now we iterate and compute
for square in squares_gen:
    print(square, end=" ")

Expected output:

0 1 4 9 16 25 36 49 64 81

Generators are perfect for processing large files line-by-line or streaming data:

python
def read_large_file(filepath):
    with open(filepath) as f:
        for line in f:
            yield line.strip()
 
# Process without loading entire file into memory
for line in read_large_file("huge_file.txt"):
    process(line)

Generator expressions work with functions expecting iterables:

python
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
 
# Sum only even numbers without creating intermediate list
total = sum(n for n in numbers if n % 2 == 0)
print(f"Sum of evens: {total}")

Expected output:

Sum of evens: 30

Notice the parentheses are optional when the generator is the sole argument to a function. This kind of pipeline, filter, transform, aggregate, all in one expression with no intermediate collections, is a pattern you'll use constantly in data processing code.

Performance Considerations

Beyond the for vs while speed difference, a handful of specific loop patterns have significant performance implications you should internalize before your datasets grow large.

The single biggest win is avoiding repeated membership tests on a list. Checking if x in some_list is O(n), Python has to scan through the entire list every time. If you're calling that check inside a loop, you've just turned a linear operation into a quadratic one. Converting the list to a set first, some_set = set(some_list), makes every subsequent in check O(1). For a thousand iterations over a thousand-element list, that's the difference between a million operations and a thousand.

The second major consideration is list building inside loops. Every time you do result = result + [item], Python allocates a brand-new list and copies all previous elements into it. That's O(n) per append, giving O(n²) overall. Using result.append(item) instead is O(1) amortized. Better still, if you know in advance what you're building, use a list comprehension, Python's bytecode for comprehensions skips the overhead of method dispatch on each iteration. Finally, look up operations in a loop, if you're calling the same function or accessing the same attribute on every iteration, consider caching the reference outside the loop to avoid the lookup cost. For tight inner loops processing millions of elements, these micro-optimizations compound into real gains.

Filtering and Validation Loops

Often you're processing data and need to validate, filter, or categorize items. Here are patterns for these common tasks:

Filtering Valid Items

python
items = [10, -5, 20, 0, 15, -3, 8]
 
valid_items = [item for item in items if item > 0]
print(f"Valid items: {valid_items}")

Expected output:

Valid items: [10, 20, 15, 8]

Separating Into Categories

python
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
 
evens = []
odds = []
 
for num in numbers:
    if num % 2 == 0:
        evens.append(num)
    else:
        odds.append(num)
 
print(f"Evens: {evens}")
print(f"Odds: {odds}")

Expected output:

Evens: [2, 4, 6, 8, 10]
Odds: [1, 3, 5, 7, 9]

Counting with a Default

python
from collections import defaultdict
 
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
 
word_count = defaultdict(int)
for word in words:
    word_count[word] += 1
 
print(dict(word_count))

Expected output:

{'apple': 3, 'banana': 2, 'cherry': 1}

Without defaultdict, you'd need to check if word in word_count first. This is cleaner.

Finding Duplicates

python
items = [1, 2, 3, 2, 4, 3, 5, 1]
 
seen = set()
duplicates = set()
 
for item in items:
    if item in seen:
        duplicates.add(item)
    seen.add(item)
 
print(f"Duplicates: {sorted(duplicates)}")

Expected output:

Duplicates: [1, 2, 3]

Sets make this efficient, checking membership in a set is O(1), whereas checking in a list is O(n).

Common Loop Mistakes

Even experienced developers fall into certain loop traps. Knowing them ahead of time will save you hours of debugging.

The most dangerous mistake is modifying a collection while iterating over it. When you remove or add elements to a list mid-loop, the iterator's internal pointer gets out of sync with the list's actual contents. In the best case, you skip elements; in the worst case, you get an IndexError or silent data corruption. Always build a new collection instead of mutating the one you're looping over, a list comprehension with a filter condition is the cleanest solution.

A close second is the unbounded while loop, a loop whose termination condition can never become false. This happens most often when you forget to update the variable the condition depends on, or when you update it in the wrong branch of an if statement. Python has no built-in timeout for loops, so an infinite loop will spin forever, consuming 100% of one CPU core. Always verify that every path through your while loop moves the state toward termination. If there's any doubt, add a hard iteration limit as a safety valve.

The third common mistake is using the wrong loop variable inside a nested loop. When you have loops named i and j, it's easy to accidentally write matrix[i][i] instead of matrix[i][j], especially when the code spans multiple lines. Give your loop variables meaningful names whenever possible, for row_idx in range(rows) is much harder to confuse than for i in range(rows). Finally, watch out for accidentally reusing a loop variable name from an outer loop inside an inner loop, because Python's loop variables persist after the loop ends, the inner loop will clobber the outer loop's variable if they share a name, leading to subtle bugs in the logic that runs after the inner loop.

Real-World Loop Patterns

Processing Pairs of Consecutive Elements

python
data = [1, 2, 3, 4, 5]
 
for current, next in zip(data, data[1:]):
    print(f"{current} -> {next}")

Expected output:

1 -> 2
2 -> 3
3 -> 4
4 -> 5

Filtering and Transforming in One Pass

python
prices = [10, 25, 5, 30, 15]
 
discounted = []
for price in prices:
    if price > 10:
        discounted.append(price * 0.9)
 
print(discounted)

Expected output:

[22.5, 27.0, 13.5]

Or as a comprehension:

python
prices = [10, 25, 5, 30, 15]
discounted = [p * 0.9 for p in prices if p > 10]
print(discounted)

Early Exit with a Sentinel

python
lines = ["hello", "world", "", "end", "more"]
 
for line in lines:
    if line == "":
        break
    print(line)

Expected output:

hello
world

Accumulation Patterns

python
words = ["apple", "banana", "cherry", "date"]
 
word_lengths = {}
for word in words:
    word_lengths[word] = len(word)
 
print(word_lengths)

Expected output:

{'apple': 5, 'banana': 6, 'cherry': 6, 'date': 4}

Choosing the Right Loop Construct

You now have many tools: for, while, comprehensions, generators, zip(), enumerate(), and itertools. Here's a decision tree for choosing the right one.

When to Use for

Use a for loop when:

  • You're iterating over a sequence (list, tuple, string, dict)
  • You know (or don't care) how many iterations you'll make
  • You want readable, Pythonic code
python
# Best for collections
for item in items:
    print(item)

When to Use while

Use a while loop when:

  • The number of iterations is unknown
  • You're checking a condition that changes each iteration
  • You need to handle user input or external events
python
# Best for condition-based logic
password = ""
while password != "secret":
    password = input("Enter password: ")

When to Use List Comprehensions

Use comprehensions when:

  • You're creating a new list based on an existing one
  • The logic is simple (one or two operations)
  • You're filtering or transforming
python
# Best for creating derived collections
squared = [x ** 2 for x in numbers if x > 0]

When to Use Generators

Use generators when:

  • Processing large datasets or infinite sequences
  • Memory is a concern
  • You only need to iterate once
python
# Best for lazy evaluation
large_squares = (x ** 2 for x in huge_dataset)

Summary

Loops are the heartbeat of data processing. Master these patterns and you'll find yourself reaching for the right tool intuitively, without having to consciously weigh the tradeoffs each time.

  • for loops are your primary tool. Use them for iterating over sequences.
  • range() generates number sequences with precise control over start, stop, and step.
  • enumerate() pairs indexes with values, no more manual indexing.
  • zip() synchronizes parallel iteration over multiple sequences.
  • while loops are for condition-based logic when you don't know iteration count in advance.
  • break and continue provide fine-grained control; else on loops adds elegance.
  • List comprehensions are faster and cleaner than loops for creating filtered or transformed lists.
  • itertools provides advanced patterns like product, chain, and islice.
  • For is faster than while for collections, always prefer for-loop iteration when possible.

The iterator protocol is what binds all of these together, understanding that Python's for loop is really just a standardized way of calling __next__() explains why so many things are iterable, and why you can write your own. Performance matters most at scale: set membership over list membership, append() over concatenation, generators over lists for large data. And the common mistakes, modifying a collection while iterating, unbounded while loops, variable name collisions in nested loops, are worth memorizing now so you don't debug them at 2am later.

The best loops are often invisible to the reader. They're so natural that the intent jumps off the page. That comes from practice and choosing the right tool for each job. As you move through this series and start working with NumPy arrays, pandas DataFrames, and machine learning pipelines, the loop intuitions you build here will transfer directly, you'll recognize mini-batch iteration as islice, vectorized operations as comprehensions at the C level, and training loops as while loops with convergence conditions. Loops are not just a beginner topic; they're a fundamental lens through which all of data processing becomes legible.

In the next article, we'll tackle error handling with try/except blocks. Your programs will crash sometimes, and that's okay, we'll learn to handle failures gracefully and recover from unexpected situations.

Need help implementing this?

We build automation systems like this for clients every day.

Discuss Your Project