Skip to content

Gardener — Product Normalization

Gardener is an AI-powered product normalizer that transforms raw supplier product names into structured, searchable fields. It runs as a separate step before the Matcher and is the foundation for accurate product linking.

Taxonomy Classification

Maps raw product names to site taxonomy (categories, colors, sizes) from YAML definitions.

Field Extraction

Extracts product_type, model_name, manufacturer_sku, color, size, and gender from unstructured text.

Two-Pass Pipeline

Deterministic synonym matching first (free), LLM for unresolved products (gpt-5-mini).

PIM Integration

Dedicated normalization table with inline editing, re-normalization with prompt hints, and taxonomy candidate review.

How It Works

Gardener processes products per-brand and per-source (Dealer or Oscar separately). This is critical for maintaining normalization quality.

Pipeline Overview

┌─────────────────────────────────────────────────┐
│ 1. fetch_and_prepare │
│ Load taxonomy (YAML) + few-shot examples │
│ Fetch unprocessed products for brand/source │
└──────────────────────┬──────────────────────────┘
┌─────────────────────────────────────────────────┐
│ 2. normalize (two-pass) │
│ Pass 1: Deterministic (synonym matching) │
│ Pass 2: LLM (gpt-5-mini, single call) │
└──────────────────────┬──────────────────────────┘
┌─────────────────────────────────────────────────┐
│ 3. write_results │
│ Update DB fields + taxonomy candidates │
└─────────────────────────────────────────────────┘

Example

Input (raw from supplier):

"Куртка мембранна чоловіча Marmot PreCip Eco 41200 Nori L"

Output (normalized):

FieldValueSource
product_typeКурткаLLM extraction
model_namePreCip EcoLLM extraction
manufacturer_sku41200LLM extraction (from name)
normalized_categoryclothing/jackets_hardcategories.yaml mapping
normalized_colorgreencolors.yaml mapping
original_colorNoripreserved from input
normalized_sizeLsizes.yaml mapping
gendermenLLM extraction

Source Isolation

Gardener always runs with a source parameter:

# Normalize dealer products — uses dealer examples only
await client.execute_agent("gardener", {
"brand": "Marmot",
"source": "dealer",
"only_new": True,
})
# Normalize oscar products — uses oscar examples only
await client.execute_agent("gardener", {
"brand": "Marmot",
"source": "oscar",
"only_new": True,
})

Two-Pass Normalization

Pass 1: Deterministic (No LLM)

Uses taxonomy YAML files with synonym matching and regex patterns:

  • colors.yaml — 47 colors with 500+ synonyms and regex patterns
  • sizes.yaml — 10 size groups (clothing, footwear, sleeping bags, backpacks, containers)
  • categories.yaml — 258 categories with keywords

Coverage: ~60-70% of colors and sizes are resolved without LLM.

Pass 2: LLM (gpt-5-mini)

For products not fully resolved by Pass 1. Processes batches of 20-50 products in a single call.

Model: Configured in contextunity.project.yaml per-tenant (typically openai:gpt-5-mini).

Features:

  • Few-shot examples: up to 10 already-normalized products of the same brand and source
  • Taxonomy context: full list of valid categories, colors, sizes from YAML
  • Field extraction: product_type, model_name, manufacturer_sku from raw name
  • Operator hints: custom prompt additions for re-normalization

Matcher Integration

Gardener runs as a separate step before Matcher. The full matching flow:

Step 1a: Gardener(brand=X, source=dealer) ← gpt-5-mini, new products only
Step 1b: Gardener(brand=X, source=oscar) ← gpt-5-mini, new products only
Step 2: Matcher Stage 1: Exact Match ← EAN/SKU, free
Step 3: Matcher Stage 2: Normalized Match ← brand+model+type+color+size, free
Step 4: Matcher Stage 3: RLM ← mercury-2, only remaining unmatched

Stages 1-3 cover ~60-70% of matches, reducing expensive RLM (Mercury-2) calls significantly.

Data Storage

DealerProduct

Normalized fields are written directly to DealerProduct columns:

normalized_category = "clothing/jackets_hard"
normalized_color = "green"
normalized_size = "L"
product_type = "Куртка"
model_name = "PreCip Eco"
# Metadata stored in enrichment JSON:
enrichment = {
"gardener": {
"version": "2.0",
"normalized_at": "2026-03-05T06:00:00",
"original_color": "Nori",
"manufacturer_sku": "41200",
"gender": "men",
"method": "llm", # deterministic | llm | manual | llm_with_hint
"confidence": 0.92,
"custom_hint": null,
"taxonomy_candidates": []
}
}

Oscar Product

Normalized data stored as a JSON field:

catalogue.Product
normalized_data = {
"product_type": "Куртка",
"model_name": "PreCip Eco",
"manufacturer_sku": "41200",
"normalized_category": "clothing/jackets_hard",
"normalized_color": "green",
"original_color": "Nori",
"normalized_size": "L",
"gender": "men",
"gardener_version": "2.0",
"normalized_at": "2026-03-05T06:00:00"
}

PIM Normalization UI

The PIM provides a dedicated Normalization Table at /pim/normalization/ showing all normalized products from both sources.

Features

FeatureDescription
Unified tableBoth dealer and Oscar products, with source indicator
FiltersBy brand, source (Dealer/Oscar), normalization status, gaps
Inline editingClick any normalized field → edit → save (AJAX)
Status indicators✅ fully normalized / ⚠️ has gaps / ❌ not normalized
Re-normalizeSelect products → add prompt hint → re-run Gardener
Taxonomy candidatesNew values for colors/sizes awaiting operator approval

Re-normalization with Prompt Hints

When Gardener makes a mistake, operators can re-normalize selected products with additional context:

  1. Select one or more products in the normalization table
  2. Click ”🔄 Re-normalize”
  3. Enter a prompt hint (e.g., “Nori is a green color for Marmot, not brown”)
  4. The hint is injected into the Gardener prompt
  5. Results are saved with method: "llm_with_hint"

The corrected product then serves as a few-shot example for future batches, creating a self-improving normalization loop.

Configuration

Terminal window
# Taxonomy files location (project env)
HARVESTER_CONFIG_DIR=/path/to/project/harvester/config

The underlying LLM (e.g., gpt-5-mini) is bound to the Gardener agent via the contextunity.project.yaml manifest.

Taxonomy Files

Gardener reads three YAML taxonomy files from the project’s metadata directory:

FileContentEntries
categories.yamlProduct category tree with keywords~258
colors.yamlColor definitions with synonyms and regex patterns47 colors, 500+ synonyms
sizes.yamlSize groups: clothing, footwear, sleeping bags, backpacks, containers10 groups

Taxonomy Candidates

When Gardener encounters a value not in the taxonomy, it creates a taxonomy candidate:

{
"field": "color",
"raw_value": "Dusty Teal",
"suggested_parent": "teal",
"confidence": 0.85,
"product_id": 12345,
"source": "dealer:gorgany"
}

Candidates appear in the PIM normalization UI for operator review. Approved values are added as synonyms to the corresponding YAML file.