AI Assortment Data Enrichment: Brands, Types, Competitor Matching

Alexandre Point

Alexandre Point

June 5, 2026

Data & Retail

A retail product catalog is never truly complete. Missing brands on certain SKUs, unnormalised product types, no established matching with competitor references. These gaps are not oversights: they are structural. And they have a direct impact on the quality of your pricing, merchandising, and ranging decisions.


Retail supermarket shelf illustrating product catalog data enrichment and attribute management
Finding 01

The incomplete product catalog: a structural problem, not an oversight

Ask any retail data team whether their product catalog is complete. The answer is almost always the same: "More or less. A few attributes are missing on certain product families, brands are not populated for some smaller supplier products, and competitor equivalents are only matched on the top 200 SKUs."

"More or less" is the problem.

Across a catalog of 30,000, 100,000, or 500,000 references, "more or less" means that thousands of products are invisible to your pricing algorithms, absent from your competitive positioning analysis, and poorly grouped in your performance reports. Decisions made on this basis are not wrong; they are incomplete. And in a competitive market, incompleteness has the same effect as error.

The problem is not a lack of team effort. It is scale. In theory, every product in your catalog has a GTIN (Global Trade Item Number) that should serve as a common key across systems. In practice, supplier labels arrive in dozens of inconsistent formats, brand fields are left blank on regional or own-label products, and no system automatically resolves these gaps. Manually enriching a retail product catalog at this volume generates structural data debt: for every product processed, three new ones arrive from suppliers with the same missing attributes. As we show in our analysis of retail product data silos, incomplete product master data is often a symptom of insufficient integration architecture, not a lack of human resources.

Mercio's Approach

Catalog enrichment as permanent infrastructure

At Mercio, product catalog data enrichment is not a one-off migration project. It is a permanent layer of our infrastructure, running continuously on every new product as well as updates to the existing catalog. Your data teams are not mobilised on data entry tasks. They analyse and decide.

Finding 02

Four product catalog gaps that impact your decisions in practice

An incomplete product catalog is not an abstract problem. Missing data has precise, measurable operational consequences that directly affect the quality of your pricing, merchandising, and ranging decisions. Here are four gaps that consistently surface across grocery, DIY, and general merchandise retailers, with their concrete impact.

1. The missing brand that skews your positioning analysis

Take a concrete example. In your catalog, the product "SEMI-SKIMMED UHT MILK 1L" from a regional supplier has no brand populated. For your pricing engine, this product belongs to no brand, so it is excluded or poorly weighted in brand-level positioning comparisons. For your merchandising reports, it is invisible in market share analysis by category. This problem is particularly acute for own-label and regional supplier products, which tend to arrive with the fewest structured attributes.

This is not a marginal product. It may be one of your highest-volume items in the category. But without a brand populated, it does not truly exist in your product intelligence.

2. The unnormalised product type that prevents any comparison

One supplier calls the product "ARLA SEMI-SKIMMED MILK 2L UHT". Another references it as "Semi-skimmed UHT milk 2 litres Arla". A third uses "ARLA / UHT Milk / Semi-skimmed / 2L". These are the same product (or directly comparable products), but in your database they share no common ground that an algorithm can exploit. Result: you cannot aggregate sales on this reference type, nor build a coherent pricing policy across the category.

Before normalisation
  • ARLA SEMI-SKIMMED MILK 2L UHT
  • Semi-skimmed UHT milk 2 litres Arla
  • ARLA / UHT Milk / Semi-skimmed / 2L
After normalisation

Normalised type: SEMI-SKIMMED UHT MILK 2L
Comparable, aggregatable, and exploitable by your pricing algorithms and merchandising reports.

3. The missing need state that undermines your ranging recommendations

A customer buys coffee capsules. Depending on the format, they may purchase a pack of 10, 36, or 60 capsules. These three references are different responses to the same need, but if your catalog does not explicitly group them within the same need state, your recommendation engine treats them as independent products. Your substitution recommendations are approximate. Your cannibalization analysis across pack sizes is impossible. And your ranging decisions rely on data that does not reflect the reality of purchasing behaviour.

4. The absent competitor matching that leaves your pricing operating blind

Your pricing engine needs to know, for each SKU in your catalog, which products are equivalent at your competitors. Without this matching, it cannot calculate your relative positioning or trigger the right adjustments at the right time. Building this retail competitor matching manually on a few hundred key references is feasible. Across 30,000 products, spanning Tesco, Sainsbury's, Asda, and the major discounters, with shared suppliers but different label conventions by retailer, this is a task no human team can keep current.

Before

No competitor equivalent identified for 94% of the catalog. Pricing operates without a competitive reference frame on the vast majority of SKUs.

After

Matches established across the entire catalog, with a confidence score and human review on ambiguous cases.

Mercio's Approach

Traceable enrichments, with a confidence score on every attribute

For every enriched attribute (brand, normalised product type, need state, competitor match), Mercio calculates a confidence score. High-confidence enrichments are applied automatically. Ambiguous cases are submitted for human validation before being integrated into the catalog. You know, for every attribute, which method produced it and at what level of certainty.

Finding 03

How AI-powered catalog enrichment works in practice

AI-powered product catalog enrichment is not a new concept. What has changed is the maturity of large language models (LLMs) and vector semantic search techniques. These models generate dense numerical representations of product labels (embeddings) that capture semantic proximity across label variants, languages, and input conventions. Combined with advances in NLP (Natural Language Processing), they make it possible to automate the enrichment of missing attributes at scale: inferring values from context, identifying matches between catalogs without references being letter-for-letter identical, and processing supplier labels in multiple languages simultaneously. Where a rule-based approach requires a new script for every format variant, AI adapts. Where manual enrichment hits a ceiling at a few thousand SKUs, AI scales to the full catalog.

As confirmed by Forrester's analysis of generative AI in commerce, product data enrichment is one of the first concrete opportunities of genAI for retail organisations, well ahead of consumer-facing use cases. A better-enriched catalog also improves SEO discoverability and e-commerce conversion rates: structured and complete attributes allow your products to be correctly faceted, filtered, and indexed in your internal search engine and by Google.

Product types normalised at scale

Product labels are normalised in batch via Gemini AI, with no rules to write manually for each supplier format. "ARLA SEMI-SKIMMED MILK 2L UHT" becomes "SEMI-SKIMMED UHT MILK 2L", whatever the retailer or input convention.

Brands imputed from context

The brand is inferred from the category, the supplier code, and already-enriched neighbouring products, with a confidence score calculated for each imputation.

Need states recommended

Proposed with a confidence score and submitted for validation before being applied. Business judgement stays in the loop on uncertain cases.

Competitor matches verified

With commercial plausibility checks to avoid linking two products whose prices are too far apart to be genuinely comparable in your market.

What AI does not replace: business judgement. That is why enrichments produced by these methods are designed to be reviewed, validated, or rejected by your teams, never applied blindly.
Solution 04

What catalog enrichment unlocks for your teams

Impact by team: before and after catalog enrichment

A reliably and comprehensively enriched product catalog is not an end in itself. What it enables is what matters.

Team Without catalog enrichment With enriched catalog
Pricing Competitive comparison limited to 5–10% of SKUs with manual matching Pricing engine fed by real competitor equivalents across the entire catalog
Merchandising Incomplete brand share analysis, with products without brands excluded from reports Performance reports by product type genuinely covering the full catalog
Category Management Ranging decisions based on supplier label organisation, not purchasing behaviour Ranging and de-listing decisions based on the true need state structure
Data teams 30–50% of time spent manually correcting attributes downstream of analysis Time freed for analysis and decision-making, not manual reconciliation

This is the same principle we describe in our analysis of AI agent-driven dynamic pricing strategies: decision-making agility is only possible if data is already prepared, enriched, and reliable at the moment the decision must be made. An enriched catalog is also the prerequisite for operational MDM (Master Data Management). Without reliable and normalised attributes, there is no scalable product data governance possible.

Mercio's Approach

Data that reflects the reality of your catalog at the moment you decide

Because Mercio's enrichment is continuously synchronised, your positioning analysis, price elasticity studies, and category profitability reports are based on a catalog that reflects the actual state of your data, not a stale copy from the last manual export.

  • Attribute anomalies are detected and flagged automatically, before they impact your decisions.
  • Every team (pricing, merchandising, data) consumes the same single source of truth, at the same time.
  • The enriched catalog is fully traceable: you know, for every attribute, which method produced it and at what confidence level.
Finding 05

The challenge of scale and freshness in product data

What distinguishes effective catalog enrichment from a one-off project is the ability to maintain it over time. New products arrive every week. Suppliers update their label formats. Competitors refresh their ranges. A catalog enriched at a single point in time degrades progressively without a continuous pipeline to keep pace.

-40%

This is the average degradation in attribute completeness observed across our grocery and DIY retail clients over 6 months without a continuous enrichment pipeline. Incoming new products arrive with the same gaps as existing ones, and the product data quality debt accumulates faster than teams can resolve it.

Most manual or semi-automated approaches cannot sustain the pace long-term. The product data normalisation backlog grows faster than teams can work through it, because enrichment is treated as a project rather than a process. Every new supplier onboarding, every range refresh, every seasonal product introduction resets the clock on completeness.

Building this capability in-house is possible, but as we show in our analysis of retail product data silos, the problem is not the competence of data teams: it is that this work is never finished, and it diverts skilled analysts from higher-value work.

Mercio's Approach

A continuous enrichment pipeline, not a one-off migration

At Mercio, catalog enrichment does not stop once the initial catalog has been processed. Every new incoming product is enriched automatically at the point of integration. Every supplier label update triggers a reassessment of the affected attributes. Inference rules adapt to new formats without manual rewriting.

The result: a catalog that moves closer to completeness progressively, in a measurable way, without mobilising your teams on manual data entry tasks.

Solution 06

What Mercio does in practice: AI-powered catalog enrichment at scale

At Mercio, automated catalog enrichment is a permanent layer of our infrastructure, not a one-off migration project. Here is how we build and operate your enriched product master data, covering the full product lifecycle with complete traceability.

Product data gap audit before any intervention

Before launching enrichment, Mercio performs a full diagnostic of your catalog: attribute coverage rate by family (brand, product type, need state), percentage of SKUs with no competitor matching, and product categories presenting the most normalisation inconsistencies. You know exactly where the most costly gaps are and in what order to address them for maximum impact on your pricing and merchandising performance.

Product type normalisation at scale, without manual rules

Our models analyse each product label in context via Gemini AI in batch, cross-referencing category, supplier, neighbouring products, and sales history to infer the corresponding normalised product type. "ARLA SEMI-SKIMMED MILK 2L UHT" and "Semi-skimmed UHT milk 2L Arla" automatically become the same normalised type, aggregatable and comparable. This processing applies to the entire catalog and updates with every new supplier entry.

Missing brand imputation with confidence scoring

For every SKU with no brand populated, Mercio cross-references the product label, supplier code, category, and neighbouring products to propose an imputation with a confidence score. Imputations above the confidence threshold are applied automatically. Cases below that threshold are submitted to a simple human validation interface, where your teams confirm or correct in a few clicks, without opening the ERP or PIM.

Competitor matching built and maintained across the entire catalog

Mercio establishes matches between your SKUs and equivalent competitor references using a combination of multilingual semantic search, attribute similarity, and price plausibility checks to filter out false positives whose prices are too far apart to be genuinely comparable. Matching covers the full catalog, updates with every new competitor crawl, and exposes a confidence score per match. Your pricing teams know exactly what each competitive comparison is based on.

Need state recommendations with business validation

Mercio automatically identifies need state groupings (packs of 10, 36, and 60 capsules for the same coffee range; 500 ml, 1 L, and 2 L formats for the same milk variety) and submits them for validation before application. Once validated, these groupings directly feed your ranging recommendation engines, your cannibalization analysis, and your range extension decisions.

A continuous enrichment pipeline, not a fixed-date project

Every new product entering your catalog is automatically submitted to the Mercio pipeline: type normalisation, brand imputation if missing, competitor matching attempt, need state proposal. The catalog does not degrade between update projects. It improves continuously, in a measurable and traceable way.

Product manager working on catalog data enrichment and product information normalisation
Your pricing teams should not be operating on a catalog where 94% of SKUs have no competitor matching. Your merchandising teams should not be producing brand share analyses riddled with unpopulated brands. Your data analysts should not be spending their mornings manually normalising supplier labels before they can aggregate a single sales figure.

A comprehensive, normalised, and continuously enriched product catalog is the prerequisite for your pricing, merchandising, and ranging tools to produce reliable recommendations across all your references, not just the few hundred your teams had time to process manually.

Who is Mercio's enrichment infrastructure designed for?

  • Your catalog has missing brands or unnormalised product types across a significant share of your SKUs
  • Your competitor matching is only established on your 200 to 500 most important references
  • Your pricing teams cannot compare their prices to real competitor equivalents across the full catalog
  • Your ranging decisions rely on supplier label organisation, not on a need state structure
  • Your catalog enrichment relies on manual exports or scripts maintained by one or two people
  • You have tried to build this pipeline in-house and failed to keep it up to date at the pace of supplier entries
  • Your data teams spend more than 30% of their time correcting or reconciling product attributes downstream of analysis

Optimise your product catalog data with Mercio, starting now

Let's talk about what Mercio can do for your catalog, whatever your number of references, markets, or source systems.

Get in touch with Mercio

Share

Assortment Data Enrichment: FAQ

What is the difference between rule-based assortment data enrichment and AI-powered enrichment?

Rule-based enrichment relies on manual scripts: patterns are defined for each supplier label format and applied mechanically. This works for low volumes and stable formats, but it doesn't scale. The AI-powered assortment data enrichment uses language models and semantic search techniques to infer missing attributes from context - category, supplier, neighboring products, labels in multiple languages - without each case being predefined. The result: broader, faster enrichment, applicable to assortments with hundreds of thousands of references, with a reliability score calculated for each product attribute.

Why does a brand's absence from a retail assortment impact competitive positioning analysis?

A pricing engine or a merchandising analysis tool filters and aggregates product data from the structured attributes. When a brand is not specified for an SKU, this product becomes invisible in brand positioning comparisons, absent from category market share reports, and often excluded or incorrectly weighted by algorithms. In an assortment of 50,000 SKUs, missing brands can affect thousands of SKUs - some of which represent significant volumes. Decisions made on this basis are not wrong: they are based on partial information, which has the same operational effect as an error.

How to build reliable competitive matching across an entire assortment?

Manual competitor matching in retail is feasible for a few hundred priority items, but cannot keep an entire catalog permanently up-to-date. The AI approach combines several signals: semantic similarity of labels, matching of structured attributes (brand, capacity, standardized product type), and price consistency between matched items. A reliability score is calculated for each match. Ambiguous cases - overly similar labels, suspicious price discrepancies - are submitted for human review before being applied. Mercio maintains this matching continuously: when a competitor updates their product range, new matches are identified automatically.

Should AI-powered assortment data enrichment be validated by a human team before being applied?

This depends on the confidence level calculated for each enriched attribute. High-confidence enrichments – for example, a brand that can be inferred with certainty from the supplier label and neighboring products – can be applied automatically. Enrichments for ambiguous cases or for high-impact business attributes (competitor matching, unit of need) are subject to human validation. The goal is not to eliminate business judgment, but to focus teams' attention on decisions that truly require it – and not on tasks of repetitive, low-value data entry tasks.

Can Mercio continuously update assortment data enrichment when suppliers change their formats?

This is precisely what distinguishes an assortment data enrichment infrastructure sustainable from a one-off project. Mercio treats enrichment as a continuous pipeline: every new product entering the assortment is automatically enriched upon integration. Every supplier label update triggers a re-evaluation of the relevant attributes. The semantic models adapt to new formats without manual rule rewriting. The freshness of the enriched assortment is guaranteed without mobilizing data teams for repetitive manual tasks - and without relying on a few individuals who know the scripts.