Graph Load Flow
Last Updated: 2026-01-18
Loads the clean layer into Neo4j. This is the step that materializes nodes/relationships and enforces constraints.
Source of truth
- Flow code:
data/platform/flows/graph_load.py
Inputs / outputs
- Input: clean CSVs in
BLOG_DATA_BUCKET_CLEAN - Output: Neo4j updated (constraints + nodes + relationships)
Loading strategy (high level)
- Create constraints (id uniqueness) before ingest
- Load nodes in dependency order
- Load relationships after nodes exist
Configuration
Neo4j:
NEO4J_URINEO4J_USERNAMENEO4J_PASSWORDNEO4J_DATABASE(optional)
S3:
BLOG_DATA_BUCKET_CLEAN
Verification
After a run, validate basic counts and spot-check relationships:
MATCH (n) RETURN labels(n) AS label, count(n) AS count;
MATCH ()-[r]->() RETURN type(r) AS type, count(r) AS count;