Graph Load Flow

Last Updated: 2026-01-18

Loads the clean layer into Neo4j. This is the step that materializes nodes/relationships and enforces constraints.

Source of truth

  • Flow code: data/platform/flows/graph_load.py

Inputs / outputs

  • Input: clean CSVs in BLOG_DATA_BUCKET_CLEAN
  • Output: Neo4j updated (constraints + nodes + relationships)

Loading strategy (high level)

  1. Create constraints (id uniqueness) before ingest
  2. Load nodes in dependency order
  3. Load relationships after nodes exist

Configuration

Neo4j:

  • NEO4J_URI
  • NEO4J_USERNAME
  • NEO4J_PASSWORD
  • NEO4J_DATABASE (optional)

S3:

  • BLOG_DATA_BUCKET_CLEAN

Verification

After a run, validate basic counts and spot-check relationships:

MATCH (n) RETURN labels(n) AS label, count(n) AS count;
MATCH ()-[r]->() RETURN type(r) AS type, count(r) AS count;