Data Organization and Entity-Based Structure
Overview
The data_raw/ directory is now organized by entity type, making it easier to find and manage related data files.
Directory Structure
data_raw/
├── clubs/ # Rocketry club directories
│ ├── nar_clubs.csv
│ └── tripoli_clubs.csv
├── manufacturers/ # Manufacturer directories
│ └── tripoli_manufacturers.csv
├── vendors/ # Vendor directories
│ └── tripoli_vendors.csv
├── kits/ # Rocket kit catalogs
│ ├── estes_kits.csv
│ ├── estes_kits_details.csv
│ ├── jimz_estes_kits.csv
│ ├── loc_kits.csv
│ ├── loc_kits_details.csv
│ └── rocketarium.csv
└── motors/ # Motor specifications and data
├── motorcato.csv
├── rocket_motors.csv
├── thrust_samples.csv
└── types.csv
Implementation
Base Class Support
The BaseDataSource class now supports entity-based organization through the SourceSpecification:
TRIPOLI_VENDORS_SPEC = SourceSpecification(
name="tripoli_vendors",
base_url="https://tripoli.org",
required_fields=["name"],
output_filename="tripoli_vendors.csv",
split_by_field="category", # Split records by category field
split_file_mapping={ # Map category values to entity folder and filename
"Manufacturer": "manufacturers/tripoli_manufacturers.csv",
"Vendors": "vendors/tripoli_vendors.csv",
},
)
Key Features
- Entity Folders - Use
entity_folderparameter to organize files by entity type - Split Files - Use
split_by_fieldandsplit_file_mappingto split records into multiple files - Path Support in Mapping - File mappings can include subdirectories (e.g.,
"manufacturers/file.csv") - Automatic Directory Creation - Directories are created automatically if they don't exist
- Backward Compatible - Sources without
entity_foldersave todata_raw/root
Example: Tripoli Vendors
The Tripoli vendors extractor demonstrates the split file feature with separate entity folders:
- Extracts both manufacturers and vendors in a single extraction
- Automatically splits records by the
categoryfield - Saves to separate entity folders:
data_raw/manufacturers/tripoli_manufacturers.csv(76 records)data_raw/vendors/tripoli_vendors.csv(239 records)
Benefits
- Better Organization - Related data files are grouped together
- Easier Navigation - Find files by entity type rather than source
- Scalability - Easy to add new entity types as needed
- Clarity - Clear separation between different types of data
Migration
Existing CSV files have been moved to their respective entity folders:
- Club files →
clubs/ - Manufacturer files →
manufacturers/ - Vendor files →
vendors/ - Kit files →
kits/ - Motor files →
motors/
Adding New Entity Types
To add a new entity type:
- Create the folder in
data_raw/(e.g.,data_raw/launches/) - Update the source specification with
entity_folder="launches" - The base class handles the rest automatically
Source Specifications
All data source specifications have been updated to use entity folders:
- NAR Clubs →
entity_folder="clubs" - Tripoli Clubs →
entity_folder="clubs" - Tripoli Vendors → Split files to separate entity folders:
- Manufacturers →
manufacturers/tripoli_manufacturers.csv - Vendors →
vendors/tripoli_vendors.csv
- Manufacturers →
- Estes Kits →
entity_folder="kits" - JimZ Estes Kits →
entity_folder="kits"(Historical reference data) - LOC Precision →
entity_folder="kits" - Rocketarium →
entity_folder="kits"