Gardener — Product Normalization

Gardener is an AI-powered product normalizer that transforms raw supplier product names into structured, searchable fields. It runs as a separate step before the Matcher and is the foundation for accurate product linking.

Taxonomy Classification

Maps raw product names to site taxonomy (categories, colors, sizes) from YAML definitions.

Field Extraction

Extracts product_type, model_name, manufacturer_sku, color, size, and gender from unstructured text.

Two-Pass Pipeline

Deterministic synonym matching first (free), LLM for unresolved products (gpt-5-mini).

PIM Integration

Dedicated normalization table with inline editing, re-normalization with prompt hints, and taxonomy candidate review.

How It Works

Gardener processes products per-brand and per-source (Dealer or Oscar separately). This is critical for maintaining normalization quality.

Pipeline Overview

┌─────────────────────────────────────────────────┐
│ 1. fetch_and_prepare                             │
│    Load taxonomy (YAML) + few-shot examples      │
│    Fetch unprocessed products for brand/source    │
└──────────────────────┬──────────────────────────┘
                       ▼
┌─────────────────────────────────────────────────┐
│ 2. normalize (two-pass)                          │
│    Pass 1: Deterministic (synonym matching)      │
│    Pass 2: LLM (gpt-5-mini, single call)         │
└──────────────────────┬──────────────────────────┘
                       ▼
┌─────────────────────────────────────────────────┐
│ 3. write_results                                 │
│    Update DB fields + taxonomy candidates        │
└─────────────────────────────────────────────────┘

Example

Input (raw from supplier):

"Куртка мембранна чоловіча Marmot PreCip Eco 41200 Nori L"

Output (normalized):

Field	Value	Source
`product_type`	Куртка	LLM extraction
`model_name`	PreCip Eco	LLM extraction
`manufacturer_sku`	41200	LLM extraction (from name)
`normalized_category`	clothing/jackets_hard	categories.yaml mapping
`normalized_color`	green	colors.yaml mapping
`original_color`	Nori	preserved from input
`normalized_size`	L	sizes.yaml mapping
`gender`	men	LLM extraction

Source Isolation

Gardener always runs with a source parameter:

# Normalize dealer products — uses dealer examples only
await client.execute_agent("gardener", {
    "brand": "Marmot",
    "source": "dealer",
    "only_new": True,
})

# Normalize oscar products — uses oscar examples only
await client.execute_agent("gardener", {
    "brand": "Marmot",
    "source": "oscar",
    "only_new": True,
})

Two-Pass Normalization

Pass 1: Deterministic (No LLM)

Uses taxonomy YAML files with synonym matching and regex patterns:

colors.yaml — 47 colors with 500+ synonyms and regex patterns
sizes.yaml — 10 size groups (clothing, footwear, sleeping bags, backpacks, containers)
categories.yaml — 258 categories with keywords

Coverage: ~60-70% of colors and sizes are resolved without LLM.

Pass 2: LLM (gpt-5-mini)

For products not fully resolved by Pass 1. Processes batches of 20-50 products in a single call.

Model: Configured in contextunity.project.yaml per-tenant (typically openai:gpt-5-mini).

Features:

Few-shot examples: up to 10 already-normalized products of the same brand and source
Taxonomy context: full list of valid categories, colors, sizes from YAML
Field extraction: product_type, model_name, manufacturer_sku from raw name
Operator hints: custom prompt additions for re-normalization

Matcher Integration

Gardener runs as a separate step before Matcher. The full matching flow:

Step 1a: Gardener(brand=X, source=dealer)  ← gpt-5-mini, new products only
Step 1b: Gardener(brand=X, source=oscar)   ← gpt-5-mini, new products only
Step 2:  Matcher Stage 1: Exact Match      ← EAN/SKU, free
Step 3:  Matcher Stage 2: Normalized Match ← brand+model+type+color+size, free
Step 4:  Matcher Stage 3: RLM             ← mercury-2, only remaining unmatched

Stages 1-3 cover ~60-70% of matches, reducing expensive RLM (Mercury-2) calls significantly.

Data Storage

DealerProduct

Normalized fields are written directly to DealerProduct columns:

normalized_category = "clothing/jackets_hard"
normalized_color    = "green"
normalized_size     = "L"
product_type        = "Куртка"
model_name          = "PreCip Eco"

# Metadata stored in enrichment JSON:
enrichment = {
    "gardener": {
        "version": "2.0",
        "normalized_at": "2026-03-05T06:00:00",
        "original_color": "Nori",
        "manufacturer_sku": "41200",
        "gender": "men",
        "method": "llm",          # deterministic | llm | manual | llm_with_hint
        "confidence": 0.92,
        "custom_hint": null,
        "taxonomy_candidates": []
    }
}

Oscar Product

Normalized data stored as a JSON field:

normalized_data = {
    "product_type": "Куртка",
    "model_name": "PreCip Eco",
    "manufacturer_sku": "41200",
    "normalized_category": "clothing/jackets_hard",
    "normalized_color": "green",
    "original_color": "Nori",
    "normalized_size": "L",
    "gender": "men",
    "gardener_version": "2.0",
    "normalized_at": "2026-03-05T06:00:00"
}

PIM Normalization UI

The PIM provides a dedicated Normalization Table at /pim/normalization/ showing all normalized products from both sources.

Features

Feature	Description
Unified table	Both dealer and Oscar products, with source indicator
Filters	By brand, source (Dealer/Oscar), normalization status, gaps
Inline editing	Click any normalized field → edit → save (AJAX)
Status indicators	✅ fully normalized / ⚠️ has gaps / ❌ not normalized
Re-normalize	Select products → add prompt hint → re-run Gardener
Taxonomy candidates	New values for colors/sizes awaiting operator approval

Re-normalization with Prompt Hints

When Gardener makes a mistake, operators can re-normalize selected products with additional context:

Select one or more products in the normalization table
Click ”🔄 Re-normalize”
Enter a prompt hint (e.g., “Nori is a green color for Marmot, not brown”)
The hint is injected into the Gardener prompt
Results are saved with method: "llm_with_hint"

The corrected product then serves as a few-shot example for future batches, creating a self-improving normalization loop.

Configuration

# Taxonomy files location (project env)
HARVESTER_CONFIG_DIR=/path/to/project/harvester/config

The underlying LLM (e.g., gpt-5-mini) is bound to the Gardener agent via the contextunity.project.yaml manifest.

Taxonomy Files

Gardener reads three YAML taxonomy files from the project’s metadata directory:

File	Content	Entries
`categories.yaml`	Product category tree with keywords	~258
`colors.yaml`	Color definitions with synonyms and regex patterns	47 colors, 500+ synonyms
`sizes.yaml`	Size groups: clothing, footwear, sleeping bags, backpacks, containers	10 groups

Taxonomy Candidates

When Gardener encounters a value not in the taxonomy, it creates a taxonomy candidate:

{
    "field": "color",
    "raw_value": "Dusty Teal",
    "suggested_parent": "teal",
    "confidence": 0.85,
    "product_id": 12345,
    "source": "dealer:gorgany"
}

Candidates appear in the PIM normalization UI for operator review. Approved values are added as synonyms to the corresponding YAML file.