Data Organization and Entity-Based Structure

Overview

The data_raw/ directory is now organized by entity type, making it easier to find and manage related data files.

Directory Structure

data_raw/
├── clubs/          # Rocketry club directories
│   ├── nar_clubs.csv
│   └── tripoli_clubs.csv
├── manufacturers/  # Manufacturer directories
│   └── tripoli_manufacturers.csv
├── vendors/        # Vendor directories
│   └── tripoli_vendors.csv
├── kits/           # Rocket kit catalogs
│   ├── estes_kits.csv
│   ├── estes_kits_details.csv
│   ├── jimz_estes_kits.csv
│   ├── loc_kits.csv
│   ├── loc_kits_details.csv
│   └── rocketarium.csv
└── motors/         # Motor specifications and data
    ├── motorcato.csv
    ├── rocket_motors.csv
    ├── thrust_samples.csv
    └── types.csv

Implementation

Base Class Support

The BaseDataSource class now supports entity-based organization through the SourceSpecification:

TRIPOLI_VENDORS_SPEC = SourceSpecification(
    name="tripoli_vendors",
    base_url="https://tripoli.org",
    required_fields=["name"],
    output_filename="tripoli_vendors.csv",
    split_by_field="category",  # Split records by category field
    split_file_mapping={  # Map category values to entity folder and filename
        "Manufacturer": "manufacturers/tripoli_manufacturers.csv",
        "Vendors": "vendors/tripoli_vendors.csv",
    },
)

Key Features

  1. Entity Folders - Use entity_folder parameter to organize files by entity type
  2. Split Files - Use split_by_field and split_file_mapping to split records into multiple files
  3. Path Support in Mapping - File mappings can include subdirectories (e.g., "manufacturers/file.csv")
  4. Automatic Directory Creation - Directories are created automatically if they don't exist
  5. Backward Compatible - Sources without entity_folder save to data_raw/ root

Example: Tripoli Vendors

The Tripoli vendors extractor demonstrates the split file feature with separate entity folders:

  • Extracts both manufacturers and vendors in a single extraction
  • Automatically splits records by the category field
  • Saves to separate entity folders:
    • data_raw/manufacturers/tripoli_manufacturers.csv (76 records)
    • data_raw/vendors/tripoli_vendors.csv (239 records)

Benefits

  1. Better Organization - Related data files are grouped together
  2. Easier Navigation - Find files by entity type rather than source
  3. Scalability - Easy to add new entity types as needed
  4. Clarity - Clear separation between different types of data

Migration

Existing CSV files have been moved to their respective entity folders:

  • Club files → clubs/
  • Manufacturer files → manufacturers/
  • Vendor files → vendors/
  • Kit files → kits/
  • Motor files → motors/

Adding New Entity Types

To add a new entity type:

  1. Create the folder in data_raw/ (e.g., data_raw/launches/)
  2. Update the source specification with entity_folder="launches"
  3. The base class handles the rest automatically

Source Specifications

All data source specifications have been updated to use entity folders:

  • NAR Clubsentity_folder="clubs"
  • Tripoli Clubsentity_folder="clubs"
  • Tripoli Vendors → Split files to separate entity folders:
    • Manufacturers → manufacturers/tripoli_manufacturers.csv
    • Vendors → vendors/tripoli_vendors.csv
  • Estes Kitsentity_folder="kits"
  • JimZ Estes Kitsentity_folder="kits" (Historical reference data)
  • LOC Precisionentity_folder="kits"
  • Rocketariumentity_folder="kits"