Data Sources (Overview)

Last Updated: 2026-01-18

The extraction flow pulls from multiple public sources. Each source is implemented as a BaseDataSource with a SourceSpecification.

Where sources live

  • Implementations: data/platform/src/data_sources/
  • Source docs (this section): apps/docs/src/data/pipelines/

Supported sources

Motors

Kits

Clubs

Vendors

Output conventions

  • Source implementations typically write CSVs into a local data_raw/<entity>/... folder for development.
  • In deployed runs, raw artifacts are uploaded to BLOG_DATA_BUCKET_RAW.

Adding a new data source (checklist)

  1. Create a new module under data/platform/src/data_sources/
  2. Define a SourceSpecification (name, output filename, entity folder, required fields)
  3. Implement an extractor class (extends BaseDataSource)
  4. Register it in data/platform/src/data_sources/__init__.py
  5. Add tests under data/platform/tests/
  6. Add a docs page in apps/docs/src/data/pipelines/ and link it here