Data Platform (data/platform)

Last Updated: 2026-01-18

This section documents the Rocket Club data platform that lives in this monorepo under data/platform/.

What it is

A Prefect-orchestrated ETL pipeline that:

  • Extracts rocketry data from multiple public sources
  • Normalizes/cleans it into graph-ready CSVs
  • Loads it into Neo4j
  • (Optionally) generates TypeScript/Zod schema artifacts for the web app

Where the source of truth lives

  • Prefect flows: data/platform/flows/
  • Shared pipeline code: data/platform/src/
  • Extractors: data/platform/src/data_sources/
  • Cleaning helpers / registries: data/platform/src/libraries/
  • Lambdas: data/platform/lambda/ (e.g. ORK processor)

Local development (minimal)

From repo root:

  • npx nx sync data-platform

Or inside the folder:

  • cd data/platform && uv sync --group dev

Run tests

  • npx nx test data-platform

Run flows locally

From data/platform/:

  • uv run python -m flows.data_extraction
  • uv run python -m flows.data_cleaning
  • uv run python -m flows.graph_load
  • uv run python -m flows.main_pipeline

Configuration overview

Neo4j (local dev)

Set these env vars (for example via data/platform/.env.local):

  • NEO4J_URI
  • NEO4J_USERNAME
  • NEO4J_PASSWORD
  • NEO4J_DATABASE (optional; defaults to neo4j)

S3 buckets (pipeline layers)

  • BLOG_DATA_BUCKET_CACHE
  • BLOG_DATA_BUCKET_RAW
  • BLOG_DATA_BUCKET_CLEAN