You've learned about lists, dicts, sets, heaps, and deques. You understand Big O notation. But knowing a data structure and using it wisely are two different animals. Theory becomes engineering the moment you sit down with a blank file and a real problem. And that's exactly where we are right now.

This capstone project is the bridge between "I understand how a heap works" and "I know when to pull out a heap instead of sorting a list." We're going to build a fully functional task manager from scratch, not a toy example with five lines and a print statement, but a genuine piece of software with features you'd actually want. Priorities, filtering, undo/redo, persistent storage, search, and statistics. The kind of thing you'd ship if you needed it.

Every data structure we choose in this project is a deliberate decision. We're not reaching for a dict because dicts are cool. We're reaching for a dict because we can articulate, in plain English, why a dict is the right tool for that specific job. That distinction is what separates programmers who struggle with performance problems from engineers who prevent them. By the time you finish building this, you'll have internalized that mindset in a way that no amount of flashcard drilling can replicate.

Here's what we're going to cover: we'll start with the data model, then build out each feature one at a time, pausing after each one to talk through the trade-offs. We'll also dedicate sections to the art of choosing data structures, the design decisions that shaped this project, how to test what you've built, and where to take this next. Think of it as the whole engineering story, not just the code.

By the end, you'll see how data structures aren't abstract, they're the scaffolding of useful software.

The Task Manager: What We're Building

Here's what our task manager does:

Add tasks with priority, description, and tags
View tasks by priority order (highest first)
Filter by tags instantly (O(1) lookup)
Search tasks by keyword
Undo/redo changes
Get statistics (tasks by priority, tag frequency)
Save/load from JSON
Complete tasks and archive them

The catch? We're doing this efficiently. The right data structure choice will make the difference between "works on 100 tasks" and "works on 100,000 tasks."

Choosing Data Structures

Before we write a single line of code, let's think through the problem the way a software engineer would. What operations do we actually need to perform? How often will we perform them? What's acceptable latency for each one?

For tasks themselves, we need: add a new task (happens a lot), look up a task by ID (happens constantly when completing or editing), and iterate over all tasks (happens for search and stats). A dict keyed by task ID handles all three, O(1) insert, O(1) lookup, and O(n) iteration. A plain list would give us O(n) lookup by ID because we'd have to scan the entire thing every time.

For tags, we have two levels to think about. First, does a specific task have a specific tag? That's a membership test. Sets give us O(1) for that, versus O(m) for a list where m is the number of tags per task. Second, which tasks have a given tag? That's a filtering query. If we want this to be fast at scale, we maintain an index, a dict that maps each tag string to a set of task IDs. Now filtering is O(k) where k is the number of results, not O(n) where n is every task in the system.

For undo/redo, we need a stack, last in, first out. Python's deque from the collections module is the right choice here because it gives us O(1) append and pop from either end, with the bonus of a maxlen parameter that automatically discards old entries and keeps our memory usage bounded.

For statistics, Counter from collections is purpose-built for frequency counting. It's essentially a dict with extra methods, and it eliminates a whole category of "count things in a loop" boilerplate.

The pattern here is consistent: name the operation, determine its frequency, choose the structure whose worst-case complexity aligns with your constraints. Do this before you code and you'll rarely have to rewrite.

The Core Data Model: Dicts and Lists

Let's start simple. A task is just data:

python

task = {
    "id": 1,
    "title": "Fix login bug",
    "description": "Users can't reset passwords",
    "priority": 3,  # 1=low, 5=high
    "tags": {"bug", "urgent", "backend"},
    "done": False,
    "created_at": "2025-02-25T10:30:00"
}

We're using a dict for a single task (key-value pairs make sense) and a set for tags (we need O(1) membership checking, is this task "urgent"?).

Notice that tags is a Python set literal using curly braces, not a list using square brackets. This is intentional and important. When we later need to answer "does this task have the 'urgent' tag?", checking membership in a set is O(1) regardless of how many tags are in it. That might sound like a small win for three tags, but when a task has dozens of tags or you're checking millions of tasks in a loop, it adds up fast.

Now, where do we store multiple tasks? We could use a list, but lists are O(n) for lookups. Instead, we'll use a dict of dicts, keyed by task ID:

python

from collections import defaultdict
from datetime import datetime
 
class TaskManager:
    def __init__(self):
        self.tasks = {}  # task_id -> task dict
        self.next_id = 1
        self.undo_stack = []
        self.redo_stack = []

Why a dict instead of a list? Look up by ID is O(1). Modify a task by ID is O(1). In a list, you'd scan the whole list. On 10,000 tasks, that's a 10,000x difference.

We're also initializing the undo and redo stacks as empty lists here, we'll upgrade them to deques in a moment. The next_id counter is how we generate unique IDs without needing a database sequence. Simple and effective for a single-process application.

Adding Tasks: The INSERT Problem

Let's add a task:

python

def add_task(self, title, description="", priority=1, tags=None):
    """Add a new task. Store for undo."""
    if tags is None:
        tags = set()
    else:
        tags = set(tags)  # Ensure it's a set
 
    task = {
        "id": self.next_id,
        "title": title,
        "description": description,
        "priority": priority,
        "tags": tags,
        "done": False,
        "created_at": datetime.now().isoformat()
    }
 
    # Save state before modification (for undo)
    self._save_state()
 
    self.tasks[self.next_id] = task
    self.next_id += 1
    self.redo_stack.clear()  # Clear redo when we make a new change
 
    return task

What's happening here? We're using a dict to store tasks. Adding a task is O(1), just insert a key-value pair. No scanning, no shifting elements like in a list.

The tags parameter becomes a set. Why? Because later, when we filter "show me all 'urgent' tasks," checking if a task has the 'urgent' tag is O(1) with a set. With a list of tags, it's O(m) where m is the number of tags per task.

Notice _save_state(). We're preparing for undo/redo, which we'll use a deque for later. Also notice that every time we add a task, we clear the redo stack. This is standard behavior, if you undo three actions and then make a new change, you lose the ability to redo those three undone actions. It prevents a confusing branching history.

Viewing Tasks by Priority: Heaps and Sorting

Here's where it gets interesting. Users want to see high-priority tasks first. We could sort the entire task list every time, but that's O(n log n).

If we insert frequently and view sometimes, a heap is smarter. Python's heapq module implements a min-heap, meaning the smallest element is always at the front. To get a max-heap behavior (highest priority first), we negate the priority values when pushing, then negate them again when popping:

python

import heapq
 
def get_tasks_by_priority(self, descending=True):
    """Return tasks sorted by priority. High priority first."""
    task_list = list(self.tasks.values())
 
    if descending:
        # For max-heap (Python uses min-heap by default),
        # negate the priority
        heap_items = [(-task["priority"], i, task)
                     for i, task in enumerate(task_list)]
        heapq.heapify(heap_items)
        sorted_tasks = [heapq.heappop(heap_items)[2]
                       for _ in range(len(heap_items))]
    else:
        heap_items = [(task["priority"], i, task)
                     for i, task in enumerate(task_list)]
        heapq.heapify(heap_items)
        sorted_tasks = [heapq.heappop(heap_items)[2]
                       for _ in range(len(heap_items))]
 
    return sorted_tasks

Why heapify instead of sort()? Both are O(n log n) here. But if you're repeatedly adding high-priority tasks and popping them, a heap keeps things efficient. For a one-time view, sorted() is simpler:

python

def get_tasks_by_priority_simple(self):
    """Simple version: just sort."""
    return sorted(
        self.tasks.values(),
        key=lambda t: t["priority"],
        reverse=True
    )

We'll use the simple version, clarity beats premature optimization. But now you know the tradeoff.

The lambda function key=lambda t: t["priority"] tells Python's sort algorithm what value to use when comparing tasks. Combined with reverse=True, we get highest priority first. This is a pattern you'll use everywhere in Python, any time you need to sort objects by a specific attribute, reach for the key parameter.

Filtering by Tags: Sets for O(1) Lookup

Here's the power move. Let's say we want all tasks with the "urgent" tag:

python

def filter_by_tag(self, tag):
    """Return all tasks with a given tag."""
    return [task for task in self.tasks.values()
            if tag in task["tags"]]

This is O(n _ m) where n is tasks and m is tags per task. Not great. But notice, we're checking tag in task["tags"]. Because task["tags"] is a set, that check is O(1). If it were a list, this would be O(n _ m * m), which is worse.

For something more sophisticated, imagine a tag index. Instead of scanning all tasks and checking each one's tag set, we precompute an inverted index that maps each tag to the task IDs that carry it. Here's what that looks like:

python

from collections import defaultdict
 
class TaskManager:
    def __init__(self):
        self.tasks = {}
        self.tag_index = defaultdict(set)  # tag -> set of task IDs
        self.next_id = 1
 
    def add_task(self, title, description="", priority=1, tags=None):
        """Now, update the tag index."""
        if tags is None:
            tags = set()
        else:
            tags = set(tags)
 
        task = {
            "id": self.next_id,
            "title": title,
            "description": description,
            "priority": priority,
            "tags": tags,
            "done": False,
            "created_at": datetime.now().isoformat()
        }
 
        self._save_state()
        self.tasks[self.next_id] = task
 
        # Update tag index
        for tag in tags:
            self.tag_index[tag].add(self.next_id)
 
        self.next_id += 1
        self.redo_stack.clear()
        return task
 
    def filter_by_tag_fast(self, tag):
        """O(k) where k is tasks with this tag, not O(n)."""
        task_ids = self.tag_index.get(tag, set())
        return [self.tasks[task_id] for task_id in task_ids]

Now filtering is O(k) where k is the number of matching tasks. If you're filtering millions of tasks for one tag, this is a huge win.

The defaultdict(set) is doing something elegant here. A regular dict would raise a KeyError if we tried to access a key that doesn't exist yet. defaultdict(set) automatically creates an empty set whenever we access a missing key. That's why we can write self.tag_index[tag].add(task_id) without checking if the tag already has an entry, the defaultdict handles it. This is a Python pattern you'll use constantly once you know about it.

Searching: Comprehensions and Sorted

Finding tasks by keyword is straightforward:

python

def search(self, keyword, case_sensitive=False):
    """Search title and description."""
    if not case_sensitive:
        keyword = keyword.lower()
 
    results = []
    for task in self.tasks.values():
        title = task["title"] if case_sensitive else task["title"].lower()
        desc = task["description"] if case_sensitive else task["description"].lower()
 
        if keyword in title or keyword in desc:
            results.append(task)
 
    # Sort by priority
    return sorted(results, key=lambda t: t["priority"], reverse=True)

This is O(n) for the search (we must scan every task) and O(n log n) for the sort. That's acceptable. Full-text search gets complex fast; for a task manager, this works.

The case-insensitive handling here is worth noting. We're converting both the keyword and the field values to lowercase before comparing, which means "bug" will match "Bug", "BUG", and "buG". We do the keyword conversion once, outside the loop, so we're not repeatedly calling .lower() on the keyword for every task. It's a small optimization, but it's the kind of habit that makes code faster without complicating it.

Undo/Redo: The Deque Pattern

This is where deque shines. We need a stack for undo and another for redo. The key insight is that undo/redo is fundamentally a snapshot mechanism, before we change anything, we take a picture of the current state and store it. Undoing means restoring the previous picture; redoing means restoring the picture we set aside when we undid:

python

from collections import deque
import json
 
class TaskManager:
    def __init__(self):
        self.tasks = {}
        self.tag_index = defaultdict(set)
        self.next_id = 1
        self.undo_stack = deque(maxlen=50)  # Keep last 50 states
        self.redo_stack = deque(maxlen=50)
 
    def _save_state(self):
        """Save current state for undo."""
        # Deep copy to avoid reference issues
        state = {
            "tasks": {k: dict(v) for k, v in self.tasks.items()},
            "next_id": self.next_id
        }
        self.undo_stack.append(state)
 
    def undo(self):
        """Undo last action."""
        if not self.undo_stack:
            print("Nothing to undo")
            return False
 
        # Save current state to redo
        current_state = {
            "tasks": {k: dict(v) for k, v in self.tasks.items()},
            "next_id": self.next_id
        }
        self.redo_stack.append(current_state)
 
        # Restore previous state
        previous_state = self.undo_stack.pop()
        self.tasks = previous_state["tasks"]
        self.next_id = previous_state["next_id"]
        self._rebuild_tag_index()
 
        return True
 
    def redo(self):
        """Redo last undone action."""
        if not self.redo_stack:
            print("Nothing to redo")
            return False
 
        # Save current state to undo
        current_state = {
            "tasks": {k: dict(v) for k, v in self.tasks.items()},
            "next_id": self.next_id
        }
        self.undo_stack.append(current_state)
 
        # Restore next state
        next_state = self.redo_stack.pop()
        self.tasks = next_state["tasks"]
        self.next_id = next_state["next_id"]
        self._rebuild_tag_index()
 
        return True
 
    def _rebuild_tag_index(self):
        """Rebuild tag index after restore."""
        self.tag_index.clear()
        for task in self.tasks.values():
            for tag in task["tags"]:
                self.tag_index[tag].add(task["id"])

Why deque? It's optimized for appending and popping from both ends, exactly what a stack needs. It's O(1) for both operations. Plus, maxlen=50 means old states are automatically discarded. No memory leak.

The _rebuild_tag_index method is necessary because we restore the tasks dict from a snapshot but the tag_index is derived data, it's not stored in the snapshot, it's computed from the tasks. Any time we replace the tasks dict (through undo or redo), we have to regenerate the index from scratch. This is a common pattern in systems that maintain derived caches: the cache is fast to query but must be invalidated and rebuilt whenever the source data changes.

Statistics: Counter for Tag Frequency

We want to know which tags are most popular. The Counter class from Python's collections module is purpose-built for exactly this use case, counting hashable objects and returning them in frequency order:

python

from collections import Counter
 
def get_tag_stats(self):
    """Which tags appear most often?"""
    all_tags = []
    for task in self.tasks.values():
        all_tags.extend(task["tags"])
 
    return Counter(all_tags).most_common()

The Counter object tallies occurrences. Call .most_common() and you get a list of (tag, count) tuples, sorted by frequency. Done.

We could also get priority stats. Rather than using Counter here (since priorities are bounded 1-5), a plain dict with known keys gives us exactly what we need without any overhead:

python

def get_priority_stats(self):
    """How many tasks at each priority level?"""
    stats = {1: 0, 2: 0, 3: 0, 4: 0, 5: 0}
    for task in self.tasks.values():
        if not task["done"]:
            stats[task["priority"]] += 1
    return stats

Notice we're filtering out completed tasks with if not task["done"]. Priority stats are most useful for planning what to work on next, so completed tasks shouldn't factor into the picture. This is a good example of domain logic shaping implementation choices, the business requirement (show me active work distribution) tells us what to include and exclude.

Persistence: JSON Serialization

A task manager that loses data on restart is useless. JSON is the right persistence format here, it's human-readable, universally supported, and maps naturally to Python dicts. There's one catch: Python sets aren't JSON-serializable, so we need a conversion step in both directions:

python

import json
from pathlib import Path
 
def save_to_file(self, filename="tasks.json"):
    """Persist tasks to JSON."""
    # Convert sets to lists for JSON serialization
    serializable = {}
    for task_id, task in self.tasks.items():
        task_copy = task.copy()
        task_copy["tags"] = list(task["tags"])  # set -> list
        serializable[str(task_id)] = task_copy
 
    with open(filename, "w") as f:
        json.dump(serializable, f, indent=2)
 
    print(f"Saved {len(self.tasks)} tasks to {filename}")
 
def load_from_file(self, filename="tasks.json"):
    """Load tasks from JSON."""
    if not Path(filename).exists():
        print(f"{filename} not found")
        return False
 
    with open(filename, "r") as f:
        data = json.load(f)
 
    self.tasks = {}
    self.tag_index.clear()
 
    for task_id_str, task_data in data.items():
        task_id = int(task_id_str)
        task_data["tags"] = set(task_data["tags"])  # list -> set
        task_data["id"] = task_id
        self.tasks[task_id] = task_data
 
        # Rebuild tag index
        for tag in task_data["tags"]:
            self.tag_index[tag].add(task_id)
 
        self.next_id = max(self.next_id, task_id + 1)
 
    print(f"Loaded {len(self.tasks)} tasks from {filename}")
    return True

Key detail: Sets can't be serialized to JSON directly. We convert them to lists before saving, then back to sets when loading. It's a small dance but necessary.

Also notice how we reconstruct next_id during loading. We take the maximum task ID across all loaded tasks and add 1. This ensures that any tasks we add after loading will have IDs that don't collide with the ones we just loaded. It's a simple but important detail, without it, we'd start assigning IDs from 1 again and immediately create duplicate IDs.

Design Decisions Explained

Every piece of this system reflects a deliberate trade-off. Let's make those trade-offs explicit, because understanding them is more valuable than memorizing the code.

We chose state snapshots for undo rather than recording individual operations (the "command pattern"). Snapshots are simpler to implement and harder to get wrong, there's no complex logic for "undo a delete" versus "undo an edit." The downside is memory: each snapshot copies the entire tasks dict. With maxlen=50, we cap that at fifty copies. For a personal task manager with a few thousand tasks, this is completely reasonable. For a system managing millions of records, you'd switch to the command pattern.

We maintain a separate tag index rather than computing it on the fly. This is a classic space-time trade-off: we use more memory (the index dict) to get faster queries. Every time a task is added or modified, we pay a small cost to update the index. Every time we filter by tag, we get much faster results in return. This trade-off makes sense when reads outnumber writes, which is the typical pattern for a task manager.

We store tags as sets inside each task even though the index makes per-task tag lookups redundant. Why? Consistency and simplicity. The task dict remains a complete, self-contained record of all task data. If we ever serialize a single task, or pass it to a function that doesn't know about the index, it still has all the information it needs. Duplication with a clear purpose is fine.

We use integers as task IDs rather than UUIDs. Integers are simpler to type, easier to remember, and compact in storage. UUIDs make sense when multiple systems need to generate IDs without coordination, for a single-process desktop app, sequential integers are the right call.

Putting It Together: The Full Class

Let's see the TaskManager in action:

python

from collections import defaultdict, deque, Counter
from datetime import datetime
import json
from pathlib import Path
 
class TaskManager:
    def __init__(self):
        self.tasks = {}
        self.tag_index = defaultdict(set)
        self.next_id = 1
        self.undo_stack = deque(maxlen=50)
        self.redo_stack = deque(maxlen=50)
 
    def _save_state(self):
        state = {
            "tasks": {k: dict(v) for k, v in self.tasks.items()},
            "next_id": self.next_id
        }
        self.undo_stack.append(state)
        self.redo_stack.clear()
 
    def add_task(self, title, description="", priority=1, tags=None):
        if tags is None:
            tags = set()
        else:
            tags = set(tags)
 
        task = {
            "id": self.next_id,
            "title": title,
            "description": description,
            "priority": priority,
            "tags": tags,
            "done": False,
            "created_at": datetime.now().isoformat()
        }
 
        self._save_state()
        self.tasks[self.next_id] = task
        for tag in tags:
            self.tag_index[tag].add(self.next_id)
 
        self.next_id += 1
        return task
 
    def complete_task(self, task_id):
        if task_id not in self.tasks:
            return False
        self._save_state()
        self.tasks[task_id]["done"] = True
        return True
 
    def get_tasks_by_priority(self):
        return sorted(
            self.tasks.values(),
            key=lambda t: t["priority"],
            reverse=True
        )
 
    def filter_by_tag(self, tag):
        task_ids = self.tag_index.get(tag, set())
        return [self.tasks[task_id] for task_id in task_ids]
 
    def search(self, keyword):
        keyword_lower = keyword.lower()
        results = [
            task for task in self.tasks.values()
            if keyword_lower in task["title"].lower()
            or keyword_lower in task["description"].lower()
        ]
        return sorted(results, key=lambda t: t["priority"], reverse=True)
 
    def get_tag_stats(self):
        all_tags = []
        for task in self.tasks.values():
            all_tags.extend(task["tags"])
        return Counter(all_tags).most_common()
 
    def get_priority_stats(self):
        stats = {1: 0, 2: 0, 3: 0, 4: 0, 5: 0}
        for task in self.tasks.values():
            if not task["done"]:
                stats[task["priority"]] += 1
        return stats
 
    def undo(self):
        if not self.undo_stack:
            return False
        current_state = {
            "tasks": {k: dict(v) for k, v in self.tasks.items()},
            "next_id": self.next_id
        }
        self.redo_stack.append(current_state)
        previous_state = self.undo_stack.pop()
        self.tasks = previous_state["tasks"]
        self.next_id = previous_state["next_id"]
        self._rebuild_tag_index()
        return True
 
    def redo(self):
        if not self.redo_stack:
            return False
        current_state = {
            "tasks": {k: dict(v) for k, v in self.tasks.items()},
            "next_id": self.next_id
        }
        self.undo_stack.append(current_state)
        next_state = self.redo_stack.pop()
        self.tasks = next_state["tasks"]
        self.next_id = next_state["next_id"]
        self._rebuild_tag_index()
        return True
 
    def _rebuild_tag_index(self):
        self.tag_index.clear()
        for task in self.tasks.values():
            for tag in task["tags"]:
                self.tag_index[tag].add(task["id"])
 
    def save_to_file(self, filename="tasks.json"):
        serializable = {}
        for task_id, task in self.tasks.items():
            task_copy = task.copy()
            task_copy["tags"] = list(task["tags"])
            serializable[str(task_id)] = task_copy
 
        with open(filename, "w") as f:
            json.dump(serializable, f, indent=2)
        print(f"Saved {len(self.tasks)} tasks to {filename}")
 
    def load_from_file(self, filename="tasks.json"):
        if not Path(filename).exists():
            return False
 
        with open(filename, "r") as f:
            data = json.load(f)
 
        self.tasks = {}
        self.tag_index.clear()
 
        for task_id_str, task_data in data.items():
            task_id = int(task_id_str)
            task_data["tags"] = set(task_data["tags"])
            task_data["id"] = task_id
            self.tasks[task_id] = task_data
            for tag in task_data["tags"]:
                self.tag_index[tag].add(task_id)
            self.next_id = max(self.next_id, task_id + 1)
 
        print(f"Loaded {len(self.tasks)} tasks from {filename}")
        return True

This is the complete, runnable class, everything we've built so far assembled in one place. If you've been following along section by section, this should look familiar. If something feels new, go back and read the section that introduced it. The code only makes sense when you understand why each piece exists, not just what it does.

Testing Your Task Manager

You've written code. Now you need to verify it actually works, not just by running the demo script, but by systematically exercising each feature and its edge cases. Good tests are what separate code you understand from code you trust.

Start with the happy path. Add a task, retrieve it by priority, filter by its tag, search for a keyword in its title. Verify the outputs match what you expect. Then probe the edges: what happens when you call undo with an empty undo stack? What happens when you filter by a tag that no tasks have? What happens when you try to complete a task ID that doesn't exist? These edge cases are where bugs hide.

python

# Test happy path
manager = TaskManager()
task = manager.add_task("Test task", priority=3, tags=["work"])
assert task["id"] == 1
assert task["priority"] == 3
assert "work" in task["tags"]
 
# Test undo
manager.add_task("Second task")
assert len(manager.tasks) == 2
manager.undo()
assert len(manager.tasks) == 1  # Second task gone
 
# Test edge cases
assert manager.undo() == False  # Nothing left to undo
assert manager.complete_task(999) == False  # Non-existent ID
 
# Test tag filter
manager.add_task("Backend work", tags=["work", "backend"])
work_tasks = manager.filter_by_tag("work")
assert len(work_tasks) == 2  # Both tasks have "work" tag
 
print("All tests passed.")

For a more structured approach, look into Python's built-in unittest module or the popular pytest library. Both let you write test classes or functions, run them with a single command, and get clear output when something fails. When you add new features to the task manager, write the test first. That forces you to think about the expected behavior before you write any implementation code, a practice called test-driven development, and it's worth learning.

The most important tests here are around undo/redo correctness, tag index consistency after operations, and round-trip serialization (save then load, verify all data matches). These are the operations where subtle bugs are hardest to catch by hand but easiest to catch with a good assertion.

Testing It All

Let's see the task manager in action:

python

# Create manager
manager = TaskManager()
 
# Add some tasks
manager.add_task(
    "Deploy payment system",
    "Move to production",
    priority=5,
    tags=["critical", "backend"]
)
 
manager.add_task(
    "Write docs",
    "API documentation",
    priority=2,
    tags=["docs", "backend"]
)
 
manager.add_task(
    "Fix CSS bug",
    "Mobile alignment issue",
    priority=3,
    tags=["bug", "frontend"]
)
 
# View by priority
print("Tasks by priority:")
for task in manager.get_tasks_by_priority():
    if not task["done"]:
        print(f"  [{task['priority']}] {task['title']}")
 
# Filter by tag
print("\nAll 'backend' tasks:")
for task in manager.filter_by_tag("backend"):
    if not task["done"]:
        print(f"  {task['title']}")
 
# Search
print("\nSearch for 'bug':")
for task in manager.search("bug"):
    if not task["done"]:
        print(f"  {task['title']}")
 
# Stats
print("\nTag frequency:")
for tag, count in manager.get_tag_stats():
    print(f"  {tag}: {count}")
 
print("\nPriority breakdown:")
stats = manager.get_priority_stats()
for priority, count in stats.items():
    print(f"  Priority {priority}: {count}")
 
# Complete a task
manager.complete_task(1)
print("\nCompleted task 1")
 
# Undo
manager.undo()
print("Undid completion")
 
# Save and load
manager.save_to_file()
manager2 = TaskManager()
manager2.load_from_file()
print(f"Reloaded: {len(manager2.tasks)} tasks")

Expected output:

Tasks by priority:
  [5] Deploy payment system
  [3] Fix CSS bug
  [2] Write docs

All 'backend' tasks:
  Deploy payment system
  Write docs

Search for 'bug':
  Fix CSS bug

Tag frequency:
  backend: 2
  bug: 1
  critical: 1
  docs: 1
  frontend: 1

Priority breakdown:
  Priority 1: 0
  Priority 2: 1
  Priority 3: 1
  Priority 4: 0
  Priority 5: 1

Completed task 1
Undid completion
Saved 3 tasks to tasks.json
Reloaded: 3 tasks

Run this yourself. Match the output line by line. If anything differs, trace back through the code and understand why. The expected output is documentation, it describes not just what the code does, but what you intend it to do. When they diverge, you have a bug or a misunderstanding, and either one is valuable to find.

Extension Ideas

Once you have the core working, there's a lot of interesting territory to explore. Each of these extensions requires a genuine data structure decision, they're not just features to bolt on, they're design problems in disguise.

Recurring tasks would let you mark a task as "repeat every Monday" or "repeat monthly on the 15th." The natural representation is a generator function or a rule string that you parse and expand into concrete task instances. The tricky part is deciding when to expand them, on creation, on load, or on demand. Each approach has different implications for storage size and query performance.

Subtasks transform your flat task list into a tree structure. Each task can have child tasks, and completing a parent might automatically complete its children. A nested dict or a dedicated tree class can represent this. The interesting algorithmic challenge is traversal: when you display tasks by priority, do subtasks sort independently or inherit their parent's priority?

Time estimates and due dates open up scheduling queries: "what can I finish today if I have four hours?" This requires sorting by estimated duration, filtering by due date, and potentially implementing a greedy or dynamic programming solution for task selection. It's a great excuse to revisit the algorithm complexity article from earlier in this series.

Full-text search using an inverted word index is a natural extension of the tag index pattern. Tokenize each task's title and description into words, then build a dict from each word to the set of task IDs that contain it. Searching becomes O(1) for exact word matches instead of O(n) for the substring scan we have now. It also introduces new problems, stemming, stop words, ranking by relevance, that will teach you why search engines are hard.

Why Data Structures Matter

Take a step back. We just built a working task manager, and every piece serves a purpose:

Dict for tasks: O(1) lookup by ID. Look up 1 million tasks instantly.
Set for tags per task: O(1) membership check. "Is this urgent?" in constant time.
Dict of sets (tag index): O(k) filtering where k is matches, not O(n) where n is all tasks.
Deque for undo/redo: O(1) append/pop. Undo is instant, even with 50 saved states.
Counter for statistics: Built-in frequency counting. No loops needed.
Sorted() for priority: O(n log n) sorting when we need it. Fast enough for human-sized task lists.
JSON for persistence: Standard format. Load/save without custom parsers.

Change one data structure and the whole system gets worse or better. Use a list for tasks instead of a dict, and lookups become O(n). Use a list for tags and filtering becomes O(n * m). That's not overthinking, that's engineering.

The real lesson here isn't any individual data structure. It's the habit of asking "what operations do I need, how often will I perform them, and what does my choice cost in time and memory?" That question, asked consistently, is what data structures are for.

Wrapping Up

You now know lists, dicts, sets, heaps, and deques. You understand Big O. And you've seen how they come together in actual software.

The task manager isn't a toy. It's what real applications look like under the hood, deliberately chosen data structures, trade-offs between speed and simplicity, and code that works because of, not in spite of, the foundations. Everything from search engines to databases to game engines makes the same kinds of decisions we made here, just at a larger scale and with more constraints.

Take this project further. Add a feature that genuinely interests you, maybe recurring tasks, maybe a command-line interface, maybe due dates with deadline alerts. The moment you start designing a feature and find yourself asking "wait, what data structure should I use for this?", you'll know the capstone did its job. That question is the whole game.

Next cluster, we move to classes and objects. You'll learn to package these data structures into reusable, extensible components. The task manager becomes a class you can inherit from, extend, and build larger systems on top of. We'll also talk about encapsulation, the idea that internal implementation details (like our tag index) should be hidden from users of the class, who should only interact through well-defined methods.

You've got the data structures. You've got the patterns. Now it's time to learn how to organize them into the real world.

Until next time, keep thinking about why a data structure exists, not just how to use it. That's what separates amateur code from engineering.

Build a Task Manager: Data Structures Capstone Project

The Task Manager: What We're Building

Choosing Data Structures

The Core Data Model: Dicts and Lists

Adding Tasks: The INSERT Problem

Viewing Tasks by Priority: Heaps and Sorting

Filtering by Tags: Sets for O(1) Lookup

Searching: Comprehensions and Sorted

Undo/Redo: The Deque Pattern

Statistics: Counter for Tag Frequency

Persistence: JSON Serialization

Design Decisions Explained

Putting It Together: The Full Class

Testing Your Task Manager

Testing It All

Extension Ideas

Why Data Structures Matter

Wrapping Up

Need help implementing this?