Taxonomy Classification
Maps raw product names to site taxonomy (categories, colors, sizes) from YAML definitions.
Gardener is an AI-powered product normalizer that transforms raw supplier product names into structured, searchable fields. It runs as a separate step before the Matcher and is the foundation for accurate product linking.
Taxonomy Classification
Maps raw product names to site taxonomy (categories, colors, sizes) from YAML definitions.
Field Extraction
Extracts product_type, model_name, manufacturer_sku, color, size, and gender from unstructured text.
Two-Pass Pipeline
Deterministic synonym matching first (free), LLM for unresolved products (gpt-5-mini).
PIM Integration
Dedicated normalization table with inline editing, re-normalization with prompt hints, and taxonomy candidate review.
Gardener processes products per-brand and per-source (Dealer or Oscar separately). This is critical for maintaining normalization quality.
┌─────────────────────────────────────────────────┐│ 1. fetch_and_prepare ││ Load taxonomy (YAML) + few-shot examples ││ Fetch unprocessed products for brand/source │└──────────────────────┬──────────────────────────┘ ▼┌─────────────────────────────────────────────────┐│ 2. normalize (two-pass) ││ Pass 1: Deterministic (synonym matching) ││ Pass 2: LLM (gpt-5-mini, single call) │└──────────────────────┬──────────────────────────┘ ▼┌─────────────────────────────────────────────────┐│ 3. write_results ││ Update DB fields + taxonomy candidates │└─────────────────────────────────────────────────┘Input (raw from supplier):
"Куртка мембранна чоловіча Marmot PreCip Eco 41200 Nori L"Output (normalized):
| Field | Value | Source |
|---|---|---|
product_type | Куртка | LLM extraction |
model_name | PreCip Eco | LLM extraction |
manufacturer_sku | 41200 | LLM extraction (from name) |
normalized_category | clothing/jackets_hard | categories.yaml mapping |
normalized_color | green | colors.yaml mapping |
original_color | Nori | preserved from input |
normalized_size | L | sizes.yaml mapping |
gender | men | LLM extraction |
Gardener always runs with a source parameter:
# Normalize dealer products — uses dealer examples onlyawait client.execute_agent("gardener", { "brand": "Marmot", "source": "dealer", "only_new": True,})
# Normalize oscar products — uses oscar examples onlyawait client.execute_agent("gardener", { "brand": "Marmot", "source": "oscar", "only_new": True,})Uses taxonomy YAML files with synonym matching and regex patterns:
Coverage: ~60-70% of colors and sizes are resolved without LLM.
For products not fully resolved by Pass 1. Processes batches of 20-50 products in a single call.
Model: Configured in contextunity.project.yaml per-tenant (typically openai:gpt-5-mini).
Features:
Gardener runs as a separate step before Matcher. The full matching flow:
Step 1a: Gardener(brand=X, source=dealer) ← gpt-5-mini, new products onlyStep 1b: Gardener(brand=X, source=oscar) ← gpt-5-mini, new products onlyStep 2: Matcher Stage 1: Exact Match ← EAN/SKU, freeStep 3: Matcher Stage 2: Normalized Match ← brand+model+type+color+size, freeStep 4: Matcher Stage 3: RLM ← mercury-2, only remaining unmatchedStages 1-3 cover ~60-70% of matches, reducing expensive RLM (Mercury-2) calls significantly.
Normalized fields are written directly to DealerProduct columns:
normalized_category = "clothing/jackets_hard"normalized_color = "green"normalized_size = "L"product_type = "Куртка"model_name = "PreCip Eco"
# Metadata stored in enrichment JSON:enrichment = { "gardener": { "version": "2.0", "normalized_at": "2026-03-05T06:00:00", "original_color": "Nori", "manufacturer_sku": "41200", "gender": "men", "method": "llm", # deterministic | llm | manual | llm_with_hint "confidence": 0.92, "custom_hint": null, "taxonomy_candidates": [] }}Normalized data stored as a JSON field:
normalized_data = { "product_type": "Куртка", "model_name": "PreCip Eco", "manufacturer_sku": "41200", "normalized_category": "clothing/jackets_hard", "normalized_color": "green", "original_color": "Nori", "normalized_size": "L", "gender": "men", "gardener_version": "2.0", "normalized_at": "2026-03-05T06:00:00"}The PIM provides a dedicated Normalization Table at /pim/normalization/ showing all normalized products from both sources.
| Feature | Description |
|---|---|
| Unified table | Both dealer and Oscar products, with source indicator |
| Filters | By brand, source (Dealer/Oscar), normalization status, gaps |
| Inline editing | Click any normalized field → edit → save (AJAX) |
| Status indicators | ✅ fully normalized / ⚠️ has gaps / ❌ not normalized |
| Re-normalize | Select products → add prompt hint → re-run Gardener |
| Taxonomy candidates | New values for colors/sizes awaiting operator approval |
When Gardener makes a mistake, operators can re-normalize selected products with additional context:
method: "llm_with_hint"The corrected product then serves as a few-shot example for future batches, creating a self-improving normalization loop.
# Taxonomy files location (project env)HARVESTER_CONFIG_DIR=/path/to/project/harvester/configThe underlying LLM (e.g., gpt-5-mini) is bound to the Gardener agent via the contextunity.project.yaml manifest.
Gardener reads three YAML taxonomy files from the project’s metadata directory:
| File | Content | Entries |
|---|---|---|
categories.yaml | Product category tree with keywords | ~258 |
colors.yaml | Color definitions with synonyms and regex patterns | 47 colors, 500+ synonyms |
sizes.yaml | Size groups: clothing, footwear, sleeping bags, backpacks, containers | 10 groups |
When Gardener encounters a value not in the taxonomy, it creates a taxonomy candidate:
{ "field": "color", "raw_value": "Dusty Teal", "suggested_parent": "teal", "confidence": 0.85, "product_id": 12345, "source": "dealer:gorgany"}Candidates appear in the PIM normalization UI for operator review. Approved values are added as synonyms to the corresponding YAML file.