Data Platform (data/platform)
Last Updated: 2026-01-18
This section documents the Rocket Club data platform that lives in this monorepo under data/platform/.
What it is
A Prefect-orchestrated ETL pipeline that:
- Extracts rocketry data from multiple public sources
- Normalizes/cleans it into graph-ready CSVs
- Loads it into Neo4j
- (Optionally) generates TypeScript/Zod schema artifacts for the web app
Where the source of truth lives
- Prefect flows:
data/platform/flows/ - Shared pipeline code:
data/platform/src/ - Extractors:
data/platform/src/data_sources/ - Cleaning helpers / registries:
data/platform/src/libraries/ - Lambdas:
data/platform/lambda/(e.g. ORK processor)
Local development (minimal)
Install (recommended)
From repo root:
npx nx sync data-platform
Or inside the folder:
cd data/platform && uv sync --group dev
Run tests
npx nx test data-platform
Run flows locally
From data/platform/:
uv run python -m flows.data_extractionuv run python -m flows.data_cleaninguv run python -m flows.graph_loaduv run python -m flows.main_pipeline
Configuration overview
Neo4j (local dev)
Set these env vars (for example via data/platform/.env.local):
NEO4J_URINEO4J_USERNAMENEO4J_PASSWORDNEO4J_DATABASE(optional; defaults toneo4j)
S3 buckets (pipeline layers)
BLOG_DATA_BUCKET_CACHEBLOG_DATA_BUCKET_RAWBLOG_DATA_BUCKET_CLEAN