Data Sources (Overview)
Last Updated: 2026-01-18
The extraction flow pulls from multiple public sources. Each source is implemented as a BaseDataSource with a SourceSpecification.
Where sources live
- Implementations:
data/platform/src/data_sources/ - Source docs (this section):
apps/docs/src/data/pipelines/
Supported sources
Motors
Kits
Clubs
Vendors
Output conventions
- Source implementations typically write CSVs into a local
data_raw/<entity>/...folder for development. - In deployed runs, raw artifacts are uploaded to
BLOG_DATA_BUCKET_RAW.
Adding a new data source (checklist)
- Create a new module under
data/platform/src/data_sources/ - Define a
SourceSpecification(name, output filename, entity folder, required fields) - Implement an extractor class (extends
BaseDataSource) - Register it in
data/platform/src/data_sources/__init__.py - Add tests under
data/platform/tests/ - Add a docs page in
apps/docs/src/data/pipelines/and link it here