Python Data Pipeline for Hyrox Results → Supabase + GitHub Actions

I’m building a fitness analytics web app and need help creating a reliable data pipeline to ingest HYROX race results.

Scope:

-Fetch race data from a public source (CSV/parquet/API/CDN)

-Parse and transform into a structured format

-Normalize athlete names for search (e.g., “First Last”, handle doubles/relay teams)

-Load into Supabase

-Ensure no duplicate records (idempotent upserts)

-Create a GitHub Actions workflow that runs weekly to ingest new data

Requirements:

-Python (pandas, requests, pyarrow/parquet)

-Experience with ETL/data pipelines

-PostgreSQL or Supabase experience

-GitHub Actions (CI/CD automation)

-Clean, readable, well-documented code

Deliverables:

-Python pipeline (fetch → transform → load)

-Requirements file (requirements.txt)

-.env.example for configuration

-GitHub Actions workflow file

-README with:

*setup instructions

*how to run locally

*how to configure secrets

*how to test

Important:

-I will add production credentials myself (do NOT include secrets)

-Code must not include any obfuscation or hidden external calls

-Prefer simple, maintainable architecture over overly complex solutions

Bonus if you have experience with:

-Large datasets (500k+ rows)

-Data validation / QA checks

-Search optimization (name normalization)

Budget:

-Open to proposals based on experience

Back to blog