Python Data Pipeline for Hyrox Results → Supabase + GitHub Actions
I’m building a fitness analytics web app and need help creating a reliable data pipeline to ingest HYROX race results.
Scope:
-Fetch race data from a public source (CSV/parquet/API/CDN)
-Parse and transform into a structured format
-Normalize athlete names for search (e.g., “First Last”, handle doubles/relay teams)
-Load into Supabase
-Ensure no duplicate records (idempotent upserts)
-Create a GitHub Actions workflow that runs weekly to ingest new data
Requirements:
-Python (pandas, requests, pyarrow/parquet)
-Experience with ETL/data pipelines
-PostgreSQL or Supabase experience
-GitHub Actions (CI/CD automation)
-Clean, readable, well-documented code
Deliverables:
-Python pipeline (fetch → transform → load)
-Requirements file (requirements.txt)
-.env.example for configuration
-GitHub Actions workflow file
-README with:
*setup instructions
*how to run locally
*how to configure secrets
*how to test
Important:
-I will add production credentials myself (do NOT include secrets)
-Code must not include any obfuscation or hidden external calls
-Prefer simple, maintainable architecture over overly complex solutions
Bonus if you have experience with:
-Large datasets (500k+ rows)
-Data validation / QA checks
-Search optimization (name normalization)
Budget:
-Open to proposals based on experience