
You've shipped production-grade code. You've got tests. You've containerized everything. So why are you still manually running linters, bumping versions, and wrestling with PyPI uploads? Let's fix that.
This is where automation stops being nice-to-have and becomes essential. GitHub Actions is the glue that turns your carefully crafted testing, linting, and packaging practices into a reliable, repeatable pipeline. Every commit gets validated. Every tag gets released. Every PR gets quality-gated before merge. And the best part? You sleep while your code auto-publishes to PyPI using secure, modern authentication.
Think about what you're actually solving here. Without CI/CD, your team is the pipeline. Someone has to remember to run the linter before pushing. Someone has to manually verify that the release tag matches the version in pyproject.toml. Someone has to upload the wheel to PyPI, cross their fingers, and hope they used the right credentials. That someone is either you, or nobody, which means it doesn't happen. CI/CD eliminates human error from the most mechanical parts of software development. It makes your standards automatic rather than aspirational. The moment you stop relying on people to remember things and start relying on machines to enforce them, your codebase gets measurably more reliable.
GitHub Actions in particular is worth learning deeply because it's tightly integrated with where your code already lives. It's free for public repositories and has generous limits for private ones. It supports virtually every workflow pattern you might need, from simple lint-and-test pipelines to sophisticated multi-environment deployment orchestration. By the end of this article, you'll have a production-ready CI/CD pipeline that catches bugs before humans see them, tests across Python versions simultaneously, and publishes releases with cryptographic proof of origin. Let's build it.
Table of Contents
- CI/CD Philosophy: Why Automate in the First Place
- What We're Actually Doing Here: The CI/CD Mental Model
- Setting Up Your Repository for CI/CD
- The Core CI Pipeline: test.yml
- Triggers and Events
- Matrix Strategy: Test Across Python Versions
- Checking Out Code and Setting Up Tools
- Installing Dependencies and Running Checks
- Uploading Coverage Metrics
- The Release Pipeline: publish.yml
- Triggering on Tags
- Permissions and OIDC
- Version Verification
- Building and Publishing
- Configuring PyPI for Trusted Publishing
- Caching for Speed: The Secret to Fast Builds
- Real-World Example: A Complete Workflow
- Branch Protection Rules: Making CI Mandatory
- Common CI/CD Mistakes (And How to Avoid Them)
- Workflow Optimization: Squeezing Every Second
- Common CI/CD Mistakes
- Pitfall 1: Secrets in Logs
- Pitfall 2: Slow Installs
- Pitfall 3: Flaky Tests
- Pitfall 4: Different Behavior Across OS
- Pitfall 5: Forgotten uv.lock Commits
- Debugging Failed Workflows
- Environment-Specific Configuration: Testing Against Real Services
- Conditional Steps: Running Jobs Only When Needed
- Notifications and Reporting: Telling Your Team What Happened
- Dependency Management: Keeping Dependencies Up to Date
- Secret Management: Handling API Keys and Credentials
- Reusable Workflows: Don't Repeat Yourself
- Performance Tuning: Making Your Pipeline Faster
- 1. Parallelize Everything
- 2. Cache Aggressively
- 3. Skip Unnecessary Steps
- 4. Use Lighter Runners for Simple Jobs
- 5. Split Tests by Speed
- Documentation and Runbooks: Teaching Your Team
- Running Tests Locally
- CI Pipeline
- Releasing
- Monitoring and Insights: Understanding Your Pipeline Health
- Advanced: Matrix Strategy for Operating Systems
- Workflow Artifacts and Retention
- Scheduling Nightly Runs
- Wrapping Up: Your Code Now Has a Safety Net
CI/CD Philosophy: Why Automate in the First Place
Before we write a single line of YAML, it's worth asking the deeper question: what problem are we actually solving?
The answer is trust. Specifically, the ability to trust that the code in your main branch works, the code going into production has been reviewed, and the package on PyPI was built from what you think it was built from. Without automation, trust is maintained through discipline and memory, human qualities that degrade under deadline pressure. With automation, trust is enforced by machines that don't forget, don't get distracted, and apply the same standards at 3am on a Saturday as they do at 10am on a Tuesday.
Continuous Integration is a practice, not a technology. The technology is GitHub Actions; the practice is merging small changes frequently and validating each one. Teams that practice CI merge to main multiple times per day. Every merge triggers a full suite of checks. Bugs are caught within minutes of being introduced, not days or weeks later when the cause is obscure and the fix is expensive.
Continuous Deployment extends this to the release process. Instead of a dedicated "release engineer" who follows a checklist, you define that checklist as code. The checklist runs automatically. Every release follows the exact same procedure, every time, with an audit trail. You gain consistency, speed, and, paradoxically, safety, because automation removes the human error that checkists are supposed to prevent.
The philosophy, in short: treat your deployment process as software. Write it in version control. Test it. Review it. Improve it over time. When you do this, releasing software becomes boring in the best possible way, predictable, low-stress, and repeatable.
What We're Actually Doing Here: The CI/CD Mental Model
Before we touch a YAML file, let's be clear about what CI/CD is:
Continuous Integration (CI): Every commit runs through automated checks. Tests? Run them. Linter? Run it. Type checker? Run it. If anything fails, the commit is rejected. You find out about problems in seconds, not weeks.
Continuous Deployment (CD): When you tag a release, the entire deployment chain, building, testing, versioning, publishing, runs automatically. No humans clicking buttons. No "I forgot to update the changelog" mistakes.
GitHub Actions is the orchestration layer. It watches your repository, gets triggered by events (pushes, PRs, tags, schedules), and executes workflows. A workflow is a YAML file describing what to run, when, and on which machines.
The components matter:
- Workflow: A YAML file in
.github/workflows/that defines the entire automation - Job: A discrete task (e.g., "test on Python 3.13")
- Step: A single command or action within a job
- Runner: The machine that executes the job (GitHub-hosted or self-hosted)
- Action: A reusable task (e.g., "checkout code", "setup Python")
Think of it as: Workflow → (multiple) Jobs → (multiple) Steps → (multiple) Actions.
Setting Up Your Repository for CI/CD
You need a structure. Let's establish one that scales:
my-project/
├── .github/
│ └── workflows/
│ ├── test.yml # Main CI pipeline
│ ├── publish.yml # Release pipeline
│ └── nightly.yml # Optional: nightly tests
├── src/
│ └── mypackage/
├── tests/
├── pyproject.toml
├── uv.lock
├── .gitignore
└── README.md
The directory structure above is not arbitrary, it reflects a deliberate separation of concerns. Your application code lives in src/, your tests live in tests/, and your automation lives in .github/workflows/. Keeping these distinct makes it easy to reason about what does what, and ensures that CI configuration is version-controlled alongside the code it validates. When a new engineer clones your repo, they can look at .github/workflows/ and understand the entire automation story without reading a single line of documentation.
The workflows go in .github/workflows/. GitHub will automatically discover and run them based on their triggers.
For this walkthrough, we're assuming:
- Your project uses
pyproject.toml(configured foruv) - Tests live in
tests/and run withpytest - You've got
rufffor linting andmypyfor type checking - You want to publish to PyPI on tagged releases
If you don't have these yet, go back to articles 40–49. This article assumes you've got the foundations.
The Core CI Pipeline: test.yml
Here's the workflow that runs every time you push code or open a PR. Read through it once before we break it down, notice how the structure mirrors the mental model we described: one workflow, one job with a matrix strategy, multiple steps per job, each step either running a command or invoking a pre-built action.
name: CI
on:
push:
branches: [main, develop]
pull_request:
branches: [main, develop]
jobs:
test:
strategy:
matrix:
python-version: ["3.11", "3.12", "3.13"]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v2
with:
version: "latest"
- name: Set up Python ${{ matrix.python-version }}
run: uv python install ${{ matrix.python-version }}
- name: Install dependencies
run: uv sync --all-extras
- name: Lint with Ruff
run: uv run ruff check src/ tests/
- name: Type check with mypy
run: uv run mypy src/
- name: Run tests
run: uv run pytest --cov=src --cov-report=xml tests/
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
with:
files: ./coverage.xml
fail_ci_if_error: falseLet's break this down piece by piece.
Triggers and Events
The on: block is where you declare what events activate your workflow. The two triggers we're using, push and pull_request, cover the two most critical moments in a code's life: when a developer pushes directly to a shared branch, and when they propose a change through a PR. By targeting both main and develop, you get coverage at both the feature-integration and production-preparation stages.
on:
push:
branches: [main, develop]
pull_request:
branches: [main, develop]This workflow runs:
- On every push to
mainordevelop - On every pull request that targets
mainordevelop
You can add more triggers. Want to test nightly? Add schedule: - cron: '0 2 * * *' to catch regressions. Want to run tests manually? Add workflow_dispatch to enable a "Run workflow" button in the GitHub UI. The extended version below combines all three patterns, giving you automated validation, scheduled regression testing, and manual override capability in one block.
on:
push:
branches: [main, develop]
pull_request:
branches: [main, develop]
schedule:
- cron: "0 2 * * *" # Daily at 2am UTC
workflow_dispatch: # Manual trigger buttonMatrix Strategy: Test Across Python Versions
One of the most powerful features of GitHub Actions is the matrix strategy, and it's worth understanding why this matters beyond just "runs on multiple versions." When Python releases a new version, behavior around things like dictionary ordering, exception chaining, and deprecation warnings can shift. A library that works perfectly on 3.11 might emit warnings on 3.12 and fail outright on 3.13. The matrix strategy catches these issues the moment they're introduced, not when a user files a bug report six months later.
strategy:
matrix:
python-version: ["3.11", "3.12", "3.13"]
runs-on: ubuntu-latestThis is the magic. Instead of running the job once, GitHub runs it three times in parallel, once for each Python version. If any version fails, the whole job fails. This is how you catch version-specific bugs before your users do.
The matrix variable ${{ matrix.python-version }} gets substituted in each run. So you get three jobs:
test (3.11)test (3.12)test (3.13)
All running simultaneously on GitHub's hosted runners (free tier: 20 concurrent jobs).
Checking Out Code and Setting Up Tools
The first three steps in any Python workflow follow a consistent pattern: get the code, install a package manager, install the right Python version. GitHub's hosted runners don't come with your code or your preferred tools pre-installed, you're starting fresh every time, which is exactly what makes CI reproducible. What runs on the runner is exactly what runs on the runner, with no leftover state from previous runs.
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v2
with:
version: "latest"
- name: Set up Python ${{ matrix.python-version }}
run: uv python install ${{ matrix.python-version }}actions/checkout@v4 clones your repository into the runner's filesystem. Every step after this has access to your code.
astral-sh/setup-uv@v2 installs uv (the fast Python package manager we've been using throughout this series). The version: "latest" means always grab the newest version.
Then we tell uv to install the specific Python version. uv manages Python versions too, it'll download and cache them, so subsequent runs are instant.
Installing Dependencies and Running Checks
With the environment prepared, you're now ready to run the actual quality gates. The order here is deliberate: lint before type-check before test. Linting is the cheapest check, it catches style issues and obvious bugs in milliseconds. Type checking is more expensive but still fast. Tests are the most expensive, so you run them last. If linting fails, you fail fast without paying for a full test suite run.
- name: Install dependencies
run: uv sync --all-extras
- name: Lint with Ruff
run: uv run ruff check src/ tests/
- name: Type check with mypy
run: uv run mypy src/
- name: Run tests
run: uv run pytest --cov=src --cov-report=xml tests/uv sync --all-extras installs your project and all its optional dependencies (defined in pyproject.toml). This assumes your pyproject.toml has a [tool.uv] or [project.optional-dependencies] section.
Then we run the quality gates in order:
- Linting:
ruff checkfinds style violations, unused imports, and common bugs. Fast. Strict. Non-negotiable. - Type checking:
mypyvalidates that your type hints are correct. Catches a whole class of bugs that tests miss. - Testing:
pytestwith coverage reporting. The--cov=srcflag measures test coverage;--cov-report=xmlgenerates an XML report for CI tools to ingest.
Each step uses uv run to execute tools via the project's virtual environment. This ensures version consistency.
Uploading Coverage Metrics
Once your tests pass, the coverage report is a byproduct worth capturing. The codecov-action integration does more than just upload numbers, it turns coverage data into actionable PR feedback. When a contributor opens a PR that drops coverage, Codecov comments directly on the PR with a breakdown of which new lines lack test coverage. That feedback loop accelerates code quality without requiring a human reviewer to manually check.
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
with:
files: ./coverage.xml
fail_ci_if_error: falseThis sends your coverage report to Codecov, which tracks coverage trends over time. You can integrate Codecov with GitHub to comment on PRs: "Coverage dropped 2%, here's the breakdown."
The fail_ci_if_error: false means the workflow continues even if coverage upload fails (Codecov might be temporarily down).
The Release Pipeline: publish.yml
Now for the fun part. When you tag a release, this workflow builds your package, runs final checks, and publishes to PyPI, all automatically. The key insight here is that the publish pipeline is deliberately separate from the CI pipeline. CI runs constantly; publishing happens rarely and deliberately. Keeping them separate means you can tune each one independently, and a publishing failure doesn't contaminate your CI status dashboard.
name: Publish
on:
push:
tags:
- "v*"
permissions:
contents: read
id-token: write
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v2
with:
version: "latest"
- name: Set up Python
run: uv python install 3.13
- name: Install dependencies
run: uv sync
- name: Verify version matches tag
run: |
TAG=${{ github.ref_name }}
VERSION=$(uv run python -c "import tomllib; print(tomllib.load(open('pyproject.toml', 'rb'))['project']['version'])")
if [ "$TAG" != "v$VERSION" ]; then
echo "Tag $TAG does not match version $VERSION"
exit 1
fi
- name: Run tests
run: uv run pytest tests/
- name: Build distribution
run: uv build
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
attestations: trueLet's understand what's happening here.
Triggering on Tags
Tags are the most intentional git event. Unlike branch pushes, which happen constantly during normal development, tags are deliberate markers, a developer saying "this specific commit is version 1.2.3." By triggering only on tags that match the v* pattern, you're ensuring that releases are always explicit acts, never accidental side effects of a normal push.
on:
push:
tags:
- "v*"This workflow only runs when you push a git tag that matches the pattern v* (e.g., v1.0.0, v2.1.3). No tag, no release. This prevents accidental releases.
Permissions and OIDC
The permissions block might look like boilerplate, but it's actually the crux of the modern PyPI publishing story. Before OIDC-based trusted publishing, you had to store a PyPI API token as a GitHub secret, manage its rotation, and trust that it wasn't accidentally exposed in logs. OIDC eliminates all of that. GitHub vouches for the workflow's identity, PyPI trusts GitHub, and the whole exchange uses a short-lived cryptographic token that expires after the workflow completes.
permissions:
contents: read
id-token: writeThis is the security model. id-token: write allows the workflow to request a short-lived OIDC (OpenID Connect) token from GitHub. We use this token to authenticate with PyPI, without storing a password or API token.
This is the modern, secure way to publish. You don't manage secrets; PyPI trusts GitHub's identity.
Version Verification
This three-line shell script prevents one of the most common release mistakes in Python: tagging a release while forgetting to bump the version in pyproject.toml. Without this check, you'd end up with a tag called v2.0.0 that publishes a package with version 1.9.0 in its metadata, a confusing mismatch that breaks downstream tooling and annoys users. The check is cheap to run and expensive not to have.
- name: Verify version matches tag
run: |
TAG=${{ github.ref_name }}
VERSION=$(uv run python -c "import tomllib; print(tomllib.load(open('pyproject.toml', 'rb'))['project']['version'])")
if [ "$TAG" != "v$VERSION" ]; then
echo "Tag $TAG does not match version $VERSION"
exit 1
fiThis is a safety check. If you tag v2.0.0 but forget to update pyproject.toml, the workflow fails. No mismatches. No confusion. The tag and the version must agree.
Building and Publishing
uv build produces two artifacts: a wheel for fast installation and a source distribution for environments that need to compile from source. Publishing both is a courtesy to users on unusual platforms or those who audit packages before installing. The attestations: true flag is the feature that makes modern PyPI publishing genuinely trustworthy, it cryptographically links the published package to the specific GitHub Actions run that built it.
- name: Build distribution
run: uv build
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
attestations: trueuv build creates both a wheel (.whl) and a source distribution (.tar.gz) in the dist/ directory.
The pypa/gh-action-pypi-publish action then publishes them to PyPI. The magic is in attestations: true, this adds provenance attestations to your packages, cryptographically proving they were built by this GitHub Actions workflow. Users can verify that what they're installing came from your repository, not a compromised mirror or attacker.
Configuring PyPI for Trusted Publishing
For this to work, you need to configure PyPI to trust GitHub Actions. Here's how:
- Go to https://pypi.org/manage/account/
- In the left sidebar, click "Publishing"
- Click "Add a new pending publisher"
- Fill in:
- PyPI Project Name: Exactly as it appears in
pyproject.toml(e.g.,my-awesome-package) - Owner: Your GitHub username or organization
- Repository name: Your repo name
- Workflow name:
publish.yml - Environment name: Leave empty (or set to
releaseif you want to require approval)
- PyPI Project Name: Exactly as it appears in
- Click "Add"
That's it. No API tokens. No secrets. From now on, when you push a tag, PyPI automatically trusts the GitHub workflow and publishes.
If you want an extra safety layer, set Environment name to release. Then add a GitHub environment called release in your repo settings and optionally require approval. The workflow will pause and ask for human sign-off before publishing.
Caching for Speed: The Secret to Fast Builds
CI pipelines that take 10 minutes to install dependencies are CI pipelines that don't get used. Let's cache aggressively.
The key insight about caching is the cache key design. We're using hashFiles('uv.lock') as part of the key, which means the cache is invalidated whenever dependencies change, you always install the right versions, but is reused when they haven't, which is the common case. The restore-keys fallback allows partial cache hits: if the exact uv.lock hash isn't cached, it'll fall back to any cache from the same OS, giving you a warm start even after a dependency update.
- uses: astral-sh/setup-uv@v2
with:
version: "latest"
cache: true
- name: Set up Python ${{ matrix.python-version }}
run: uv python install ${{ matrix.python-version }}
- name: Cache uv
uses: actions/cache@v4
with:
path: ~/.cache/uv
key: uv-cache-${{ runner.os }}-${{ hashFiles('uv.lock') }}
restore-keys: |
uv-cache-${{ runner.os }}-The setup-uv action has built-in caching for the uv tool itself. Then we cache the .cache/uv directory (where uv stores downloaded packages and Python versions).
The cache key is uv-cache-<os>-<hash of uv.lock>. If uv.lock hasn't changed, we use the cached dependencies. If it has, we download fresh ones and update the cache.
Result? First run takes 2 minutes. Subsequent runs (same lock file) take 10 seconds.
Real-World Example: A Complete Workflow
Let's put it together. Here's a production-grade workflow for a real project. Notice the additions compared to our basic CI pipeline: we've expanded the matrix to include three operating systems, added a format check alongside linting, added a dedicated lint-types job that runs once instead of nine times, and added a security scanning job. This is the structure that professional open-source projects use, comprehensive without being wasteful.
name: CI
on:
push:
branches: [main, develop]
pull_request:
branches: [main, develop]
schedule:
- cron: "0 2 * * *" # Nightly at 2am UTC
jobs:
test:
strategy:
matrix:
python-version: ["3.11", "3.12", "3.13"]
os: [ubuntu-latest, macos-latest, windows-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v2
with:
version: "latest"
cache: true
- name: Set up Python ${{ matrix.python-version }}
run: uv python install ${{ matrix.python-version }}
- name: Install dependencies
run: uv sync --all-extras
- name: Lint with Ruff
run: uv run ruff check src/ tests/
- name: Format check
run: uv run ruff format --check src/ tests/
- name: Type check with mypy
run: uv run mypy src/
- name: Run tests
run: uv run pytest --cov=src --cov-report=xml -v tests/
- name: Upload coverage
uses: codecov/codecov-action@v4
if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.13'
with:
files: ./coverage.xml
lint-types:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v2
- run: uv python install 3.13
- run: uv sync
- run: uv run ruff check src/
- run: uv run mypy src/
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v2
- run: uv python install 3.13
- run: uv sync
- name: Run bandit
run: uv run bandit -r src/ -f json -o bandit-report.json || true
- name: Upload security scan
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: bandit-report.jsonNotice what's happening:
- Matrix testing: Runs on three OS × three Python versions = 9 jobs in parallel
- Upload coverage once: The
if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.13'condition prevents uploading coverage 9 times - Separate lint-types job: Runs once (not repeated in matrix) so you get one clear lint report
- Security scanning: Uses
banditto check for common security issues, uploaded to GitHub's security tab
This workflow catches bugs, version incompatibilities, security issues, and coverage regressions, all in parallel, all automatically.
Branch Protection Rules: Making CI Mandatory
A workflow is worthless if developers can merge broken code. Let's make passing CI a requirement.
In your GitHub repo settings:
- Go to Settings → Branches
- Click "Add rule" under "Branch protection rules"
- Create a rule for
main:- Require status checks to pass before merging: Enable
- Require branches to be up to date before merging: Enable
- Select the status checks:
test (3.13),test (3.12), etc. - Require code reviews: At least 1 (optional but recommended)
- Dismiss stale PR approvals: Enable
- Require CODEOWNERS review: If you've set up a
CODEOWNERSfile
Now, a PR can't merge unless:
- All tests pass on all Python versions
- The branch is up to date with
main - At least one person has approved the code
This is your safety net. It prevents "I'll fix that in the next PR" incidents.
Common CI/CD Mistakes (And How to Avoid Them)
Learning CI/CD means learning what breaks it. Here are the mistakes that waste the most developer hours, and the patterns that prevent them.
The most common mistake is treating CI as a formality rather than a feedback loop. Teams configure a workflow, it goes green, and they stop paying attention to it, until it goes red at the worst possible moment. The right mindset is the opposite: your CI pipeline is a living document that should be refined over time. Monitor run durations. If your test suite starts taking 12 minutes, something changed. Investigate and fix it. Slow CI is a tax that every developer pays on every PR, and it compounds.
The second most common mistake is failing to cache dependencies properly. Every minute spent downloading packages that haven't changed is a minute your developers spend waiting. A properly cached pipeline with uv should install dependencies in under 30 seconds on a warm cache. If yours takes longer, check your cache key design and make sure you're actually getting cache hits in the GitHub Actions logs.
The third mistake is writing environment-specific code that works locally but breaks in CI. This usually manifests as hardcoded file paths, assumptions about the current working directory, or dependencies on tools that aren't installed on the runner. The fix is to run your CI workflow locally using act (a tool that runs GitHub Actions locally) before pushing, and to pay attention when CI fails on paths you've never seen before.
A fourth subtle mistake is not pinning action versions. Using actions/checkout@v4 is safe because it's a major version tag that receives non-breaking updates. But some community actions change behavior between minor versions. Pin critical actions to their full SHA for maximum reproducibility in security-sensitive pipelines: uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683.
Workflow Optimization: Squeezing Every Second
Once your pipeline works, make it fast. A pipeline that developers trust is one that gives feedback quickly. Here are the optimizations that make the biggest practical difference.
Split your jobs by speed tier. Unit tests that run in 10 seconds should never be blocked behind integration tests that run in 5 minutes. Put fast checks in one job and slow checks in another, both run in parallel, but you get your quick feedback immediately while the slow tests are still running. Use timeout-minutes on slow jobs to prevent hung tests from burning your CI minutes quota.
Use fail-fast: false in your matrix strategy during initial development. By default, if one matrix job fails, GitHub cancels the rest. That's efficient for production pipelines, but during development you often want to see all the failures at once to understand whether you have a widespread issue or a Python-version-specific one.
Upload artifacts for failed jobs. When a test fails in CI, you want to see the full output, any generated reports, and any screenshots (for browser-based tests). Add an if: failure() upload step to preserve these artifacts for debugging. There's nothing more frustrating than a flaky CI failure that produced useful logs you can't access because the artifacts expired.
Separate your dependency installation from your tool installation. If you install mypy and ruff as development dependencies, they get installed on every matrix job including your integration test jobs that don't need them. Use dependency groups in pyproject.toml and uv sync --group lint in your lint job to keep things precise.
Common CI/CD Mistakes
Let's look at the specific code patterns that cause CI failures.
Pitfall 1: Secrets in Logs
If you have API keys or tokens, never log them. Use GitHub Secrets and reference them as environment variables. The ${{ secrets.MY_API_KEY }} syntax is safe because GitHub automatically masks any string that matches a registered secret value in your workflow logs.
- name: Some step requiring auth
env:
API_KEY: ${{ secrets.MY_API_KEY }}
run: some-commandGitHub masks secret values in logs. But better: use trusted publishing and OIDC tokens instead of storing secrets at all.
Pitfall 2: Slow Installs
If every run takes 10+ minutes, developers won't trust the system. They'll merge anyway. Use caching. Use uv instead of pip. Test only what matters. The difference between a cold pip install and a cached uv sync is often 10x in wall-clock time.
# Bad: Slow
- run: pip install -r requirements.txt
# Good: Fast
- uses: astral-sh/setup-uv@v2
with:
cache: true
- run: uv syncPitfall 3: Flaky Tests
If tests pass locally but fail in CI (or vice versa), you have a flaky test. CI will expose this mercilessly. The classic pattern is timing-dependent tests that assume operations complete within a fixed window, a window that's valid on a fast developer laptop but routinely exceeded on shared CI runners under load. The fix is always to mock or control time rather than sleeping.
# Bad: Flaky (timing-dependent)
def test_cache_expiry():
cache.set("key", "value")
sleep(1.1)
assert cache.get("key") is None
# Good: Deterministic
def test_cache_expiry():
cache.set("key", "value", ttl=1)
cache._clock = clock + 1.1 # Mock time
assert cache.get("key") is NonePitfall 4: Different Behavior Across OS
If tests pass on Linux but fail on Windows, you have an OS-specific bug. Matrix testing catches this. If it happens, don't ignore it. Cross-platform path handling is the most common culprit, hardcoded forward slashes, assumptions about directory separators, or use of /tmp instead of tempfile.
# Bad: Platform-specific
path = f"/tmp/{filename}" # Fails on Windows
# Good: Cross-platform
import tempfile
from pathlib import Path
path = Path(tempfile.gettempdir()) / filenamePitfall 5: Forgotten uv.lock Commits
If you update pyproject.toml but forget to commit uv.lock, CI sees different versions than you do locally. Always commit lock files. A good safeguard is to add a CI check that verifies the lock file is up to date: uv lock --check will fail if the lock file doesn't match pyproject.toml.
# Update dependencies
uv lock --upgrade
# Commit both files
git add pyproject.toml uv.lock
git commit -m "chore: update dependencies"Debugging Failed Workflows
When a workflow fails, GitHub shows you the logs. Here's how to read them:
- Go to your repo → Actions
- Click the failed workflow
- Click the failed job
- Expand the step that failed
- Read the error. Google it if you don't understand it.
Common errors:
ModuleNotFoundError: No module named 'pytest': You forgot to install dependencies. Add- run: uv sync.ruff: command not found: You're runningruffdirectly instead ofuv run ruff.Error: Permission denied: On Windows, file permissions behave differently. Check forchmodcommands that fail.Test failed: Connection refused: Service not running. Add aservicessection to your workflow.
For services (databases, caches, etc.), use Docker. The services section in GitHub Actions starts Docker containers before your job steps run, and GitHub's runners have Docker pre-installed on all Ubuntu runners. This means you can run the exact same Postgres version in CI that you run in production.
services:
postgres:
image: postgres:16
env:
POSTGRES_PASSWORD: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 5432:5432Then your tests connect to localhost:5432.
Environment-Specific Configuration: Testing Against Real Services
Real applications don't run in a vacuum. You need databases, caches, message queues. CI should test against realistic setups, not mocks. Mocks are useful for unit tests, but integration tests that exercise your actual database queries against an actual database instance catch a class of bugs that mock-based tests simply cannot.
Here's a complete workflow that spins up PostgreSQL and Redis. The health checks are important, they prevent your test steps from running before the services are ready to accept connections, which would cause confusing "connection refused" errors that have nothing to do with your code.
name: Integration Tests
on:
push:
branches: [main, develop]
jobs:
integration:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16-alpine
env:
POSTGRES_DB: testdb
POSTGRES_USER: testuser
POSTGRES_PASSWORD: testpass
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 5432:5432
redis:
image: redis:7-alpine
options: >-
--health-cmd "redis-cli ping"
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
- 6379:6379
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v2
with:
version: "latest"
cache: true
- name: Set up Python
run: uv python install 3.13
- name: Install dependencies
run: uv sync
- name: Wait for services
run: |
until pg_isready -h localhost -p 5432; do sleep 1; done
timeout 10 bash -c 'until redis-cli -h localhost -p 6379 ping; do sleep 1; done'
- name: Run migrations
env:
DATABASE_URL: postgresql://testuser:testpass@localhost:5432/testdb
run: uv run alembic upgrade head
- name: Run integration tests
env:
DATABASE_URL: postgresql://testuser:testpass@localhost:5432/testdb
REDIS_URL: redis://localhost:6379/0
run: uv run pytest tests/integration/ -vThe services section launches containers before any steps run. The --health-cmd checks ensure the service is ready before tests start.
Your code references services via localhost. In production, you'd use different endpoints, but for testing, this works perfectly.
Conditional Steps: Running Jobs Only When Needed
Not every check is expensive or necessary. Use conditionals to skip work. The real power of conditional steps is that they let you use a single workflow file to handle multiple scenarios, PRs get a different validation experience than direct pushes, and main branch pushes get different treatment than feature branches, without duplicating the workflow logic.
- name: Run full test suite
if: github.event_name == 'pull_request'
run: uv run pytest tests/
- name: Run only unit tests
if: github.event_name == 'push'
run: uv run pytest tests/unit/
- name: Deploy to staging
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: deploy-to-staging.shUseful conditions:
github.event_name == 'pull_request': Only on PRsgithub.ref == 'refs/heads/main': Only on main branchmatrix.python-version == '3.13': Only on a specific matrix valuealways(): Even if previous steps failedfailure(): Only if previous steps failedsuccess(): Only if previous steps succeeded
This prevents running expensive integration tests on every commit while still ensuring they run before merges.
Notifications and Reporting: Telling Your Team What Happened
By default, GitHub notifies you via email. But you can send results to Slack, Discord, or custom webhooks. For teams that live in Slack, a direct notification on failure is much more actionable than an email that gets buried, it appears in the channel where the team is already discussing the work, with a direct link to the failed run.
- name: Notify Slack on failure
if: failure()
uses: slackapi/slack-github-action@v1.24.0
with:
webhook-url: ${{ secrets.SLACK_WEBHOOK }}
payload: |
{
"text": "CI failed: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
}Or generate a nice summary visible in the Actions UI:
- name: Test summary
if: always()
run: |
echo "## Test Results" >> $GITHUB_STEP_SUMMARY
echo "- Tests: ${{ job.status }}" >> $GITHUB_STEP_SUMMARY
echo "- Coverage: $(cat coverage.txt)" >> $GITHUB_STEP_SUMMARYThe $GITHUB_STEP_SUMMARY file appears as a markdown summary at the top of the workflow run. No separate notifications needed.
Dependency Management: Keeping Dependencies Up to Date
CI is where you discover when dependencies break. But you can be proactive with Dependabot. Dependabot's real value is that it creates PRs, and PRs trigger CI. You don't have to manually verify that a dependency update is safe, you just look at whether CI passed on Dependabot's PR. If it did, merge with confidence. If it didn't, Dependabot has done you a favor by surfacing a compatibility issue before it reached production.
In your repo settings, enable Dependabot and add .github/dependabot.yml:
version: 2
updates:
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "weekly"
open-pull-requests-limit: 5
reviewers:
- "your-github-username"Dependabot automatically opens PRs to update your dependencies. Each PR triggers your full CI pipeline. If tests pass, merge with confidence. If tests fail, you caught a breaking change before it hit production.
Combine this with uv lock --upgrade-all locally, and your dependencies stay fresh and tested.
Secret Management: Handling API Keys and Credentials
Never hardcode credentials. Use GitHub Secrets instead:
- Go to Settings → Secrets and Variables → Actions
- Click "New repository secret"
- Add
MY_API_KEYwith your actual key
Then reference it in your workflow. The secret value is injected at runtime as an environment variable, masked in all log output, and never visible to workflow code that logs environment variables.
- name: Deploy
env:
API_KEY: ${{ secrets.MY_API_KEY }}
run: ./deploy.shGitHub masks secret values in logs. But better practice: use environment-based authentication (like OIDC for PyPI, or IAM roles for AWS). Secrets are a safety net, not the primary solution.
For organization-wide secrets, go to Settings → Secrets and Variables → Actions at the org level. All repos can access them.
Never print secrets:
# Bad: Will be masked but still leaks intent
echo "API_KEY=$API_KEY"
# Good: No secret in output
curl -H "Authorization: Bearer $API_KEY" https://api.example.com/Reusable Workflows: Don't Repeat Yourself
If you manage multiple Python projects, you probably have similar workflows. Extract them. Reusable workflows are the DRY principle applied to CI configuration: define the pattern once, reference it everywhere, and update it in one place when standards evolve. This becomes invaluable at the organizational level, where you might have dozens of Python services all needing the same testing standards.
Create .github/workflows/shared-ci.yml:
name: Shared CI
on:
workflow_call:
inputs:
python-versions:
type: string
default: '["3.11", "3.12", "3.13"]'
jobs:
test:
strategy:
matrix:
python-version: ${{ fromJson(inputs.python-versions) }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v2
- run: uv python install ${{ matrix.python-version }}
- run: uv sync
- run: uv run ruff check src/
- run: uv run mypy src/
- run: uv run pytest tests/Then in another repo, call it. The calling workflow is minimal, just a trigger, the name of the reusable workflow, and any input overrides. All the actual CI logic lives in one canonical place.
name: CI
on:
push:
branches: [main]
jobs:
reuse:
uses: your-org/shared-workflows/.github/workflows/shared-ci.yml@main
with:
python-versions: '["3.11", "3.12"]'This is powerful for organizations with multiple projects. Update the shared workflow once, and all projects benefit.
Performance Tuning: Making Your Pipeline Faster
Slow pipelines don't get run. Here are practical optimizations:
1. Parallelize Everything
Use matrix strategy for independent tests:
strategy:
matrix:
test-group: [unit, integration, e2e]
- run: uv run pytest tests/${{ matrix.test-group }}/2. Cache Aggressively
- uses: actions/cache@v4
with:
path: ~/.cache/uv
key: uv-${{ runner.os }}-${{ hashFiles('uv.lock') }}Cache pip packages, build artifacts, Docker layers, anything that doesn't change often.
3. Skip Unnecessary Steps
- if: contains(github.event.head_commit.message, '[skip ci]')
run: echo "Skipping CI"
- if: contains(github.event.head_commit.message, '[skip ci]')
uses: actions/github-script@v7
with:
script: core.setFailed('CI skipped')Adding [skip ci] to commit messages skips the entire workflow.
4. Use Lighter Runners for Simple Jobs
Not every job needs ubuntu-latest. For quick linting:
lint:
runs-on: ubuntu-latest # Standard
docs:
runs-on: ubuntu-latest # StandardBut consider self-hosted runners for resource-heavy tests (if your org supports it).
5. Split Tests by Speed
jobs:
quick:
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- run: uv run pytest tests/unit/ -q
slow:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- run: uv run pytest tests/integration/ tests/e2e/Quick tests block merges. Slow tests run in parallel and report separately.
Documentation and Runbooks: Teaching Your Team
Your CI pipeline is useless if nobody understands it. Document it:
Create CONTRIBUTING.md:
## Running Tests Locally
```bash
uv sync
uv run pytest tests/
```CI Pipeline
Our CI runs:
- Tests: pytest across Python 3.11–3.13
- Linting: ruff for style and common errors
- Type checking: mypy for static type validation
- Coverage: We require >80% coverage
See .github/workflows/ for implementation.
Releasing
- Update version in
pyproject.toml - Update
CHANGELOG.md - Tag:
git tag v1.2.3 - Push:
git push origin v1.2.3 - CI publishes to PyPI automatically
This teaches new contributors how things work and sets expectations.
## Monitoring and Insights: Understanding Your Pipeline Health
GitHub provides insights into your workflow:
1. Go to your repo → Insights → Actions
2. See execution times, success rates, trends
3. Identify slow jobs and optimize them
4. Track which steps fail most often
For detailed metrics, export workflow runs as JSON. Analyzing this data over time reveals patterns that aren't visible in individual runs, a test that's been getting progressively slower for three weeks, or a security check that fails every other Sunday morning for no apparent reason.
```bash
gh run list --repo owner/repo --json conclusion,durationMinutes
Use this data to make decisions: "This integration test takes 15 minutes. Should we move it to nightly?"
Advanced: Matrix Strategy for Operating Systems
For libraries that run on multiple platforms, test them all. The 9-job matrix (3 Python versions × 3 operating systems) sounds expensive, but it runs in parallel and typically completes faster than a single-threaded comprehensive test suite would on a single machine. The value is asymmetric: a few extra CI minutes to catch a Windows-specific bug before it reaches users is almost always worth it.
strategy:
matrix:
python-version: ["3.11", "3.12", "3.13"]
os: [ubuntu-latest, macos-latest, windows-latest]
runs-on: ${{ matrix.os }}This creates 9 jobs (3 Python × 3 OS). You'll find platform-specific bugs immediately.
Workflow Artifacts and Retention
If a test generates a report or screenshot, save it. The if: always() condition on the upload step is critical, it ensures you capture artifacts whether the tests passed or failed. Failing tests produce the most valuable artifacts, so uploading only on success would be backwards.
- name: Run tests
run: pytest --html=report.html tests/
- name: Upload test report
if: always()
uses: actions/upload-artifact@v4
with:
name: test-report-${{ matrix.python-version }}
path: report.html
retention-days: 30The if: always() ensures artifacts upload even if tests fail. GitHub stores them for 30 days. You can download and inspect them.
Scheduling Nightly Runs
For long-running tests (integration tests, load tests), run them nightly. The CRON syntax for GitHub Actions follows standard UNIX CRON format, but there's one gotcha: all times are UTC. If your team is distributed across time zones, pick a nightly time that minimizes overlap with working hours globally, 2am UTC is often a reasonable choice that lands in off-hours for both European and American teams.
on:
schedule:
- cron: '0 2 * * *' # Every day at 2am UTC
workflow_dispatch # Manual trigger buttonCRON format: minute hour day month day-of-week.
0 2 * * *= 2am UTC every day0 0 * * 0= Midnight UTC every Sunday0 */6 * * *= Every 6 hours
GitHub runs scheduled workflows with the default branch only.
Wrapping Up: Your Code Now Has a Safety Net
You've built more than a CI pipeline, you've built a trust infrastructure. The workflows we've covered don't just catch bugs; they enforce standards automatically, create audit trails for every release, and make the implicit rules of your project explicit and machine-enforced. New contributors get immediate feedback. Experienced contributors get protected from the kind of mechanical mistakes that slip through even on a good day.
The progression from this article to the rest of your Python journey is deliberate. You now have automated testing, type checking, linting, dependency management, and secure publishing. These are the foundations that make every subsequent improvement safe to deploy. When you add generators, async code, or ML pipelines in the upcoming clusters, you'll be adding them to a codebase with a safety net, one that catches regressions immediately and validates across Python versions and operating systems without anyone having to remember to check.
The system you've built:
- Runs tests automatically on every commit
- Tests across Python versions and operating systems simultaneously
- Lints and type-checks your code on every PR
- Caches dependencies for fast feedback
- Prevents merges when tests fail
- Publishes releases securely with cryptographic attestations
- Uploads coverage metrics and security scans
This is what professional Python projects look like. From here, we move into Cluster 6: Concurrency and Performance. You've got reliable, automated code. Now let's make it fast.