Manufacturer Registry

Last Updated: 2026-01-18

The manufacturer registry prevents duplicate manufacturers when ingesting multiple sources.

Problem it solves

Different sources refer to the same manufacturer in inconsistent ways (IDs, casing, legacy names). Without a registry, Neo4j would accumulate duplicates and break relationships.

Example: “LOC Precision” may appear as LOC, Loc, LOC Precision, or a Tripoli-generated ID.

How it works

  1. Exact match (case-insensitive) against known aliases
  2. Fuzzy match when exact match fails (configurable threshold)
  3. Auto-create a new canonical entry when enabled
graph LR
  A[Source A: "LOC Precision"] --> R[Registry]
  B[Source B: "loc-621319214668"] --> R
  R --> C[Canonical: "LOC"]

Configuration

Mappings are driven by YAML (no code changes required):

  • data/platform/config/manufacturer_mappings.yaml

Implementation

  • Library: data/platform/src/libraries/manufacturer_registry.py

Where it’s used

  • Primarily during Data Cleaning
  • Also helpful for any ingest step that needs stable manufacturer IDs