MXP Platform

How the MXP Catalog Enrichment pipeline discovers and fills missing product attributes using Gemini AI on GCP

The Catalog Enrichment pipeline is a GCP-native, fully automated system that detects and fills gaps in product attribute data — missing colors, materials, descriptions, and other fields that degrade search ranking and recommendation quality.

The pipeline consumes the product snapshot already produced by the GD Indexing Pipeline, so no separate catalog export is needed. Each tenant has its own isolated pipeline instance. The pipeline runs daily via Cloud Scheduler and is orchestrated end-to-end by Google Cloud Workflows.

Overview

Products flow from your PIM or commerce platform through the GD Indexing Pipeline, which enriches attributes using Gemini AI and pushes the results directly into the search index. Every AI-generated value carries a confidence score — and merchandisers can revert any change at any time.

Enrichment Flow

Pipeline stages

Step 1 — Catalog Analytics

Reads the product snapshot and produces attribute coverage statistics: which fields are sparse, how many products are missing each attribute, and the top-N most common values per field. These statistics surface configuration gaps and improve enrichment quality over time.

Step 2 — Gemini Enrichment Processor

The core enrichment stage. For each product in the snapshot:

Reads the product's existing attributes and the per-tenant enrichment rules from Cloud Storage
Checks Cloud SQL for any previously stored enrichment results (avoids re-enriching unchanged products)
Calls Gemini AI in parallel batches to generate missing or incorrect attribute values
Applies a two-phase generate → validate pattern to catch hallucinations and assign a confidence level
Writes only changed products to the updates table — incremental by design

Step 3 — Product Importer

Reads the updates table and performs an incremental import back into Vertex AI Retail, updating only the products that changed. The enriched attributes become immediately available for search and recommendations.

GCP services

Service	Role
Google Cloud Workflows	Sequences and monitors all three pipeline stages end-to-end; one instance per tenant
Cloud Scheduler	Triggers the workflow on a daily cron (default: `02:00 UTC`)
Google Kubernetes Engine	Runs each pipeline stage as a short-lived Kubernetes Job; also hosts the long-running Catalog Enrichment API (attribute configuration, review UI backend)
Vertex AI Retail API	Destination for enriched product imports
Gemini AI	Generates AI attribute values per product
BigQuery	Stores product snapshots (from GD Indexing Pipeline), enrichment updates, and analytics results
Cloud Storage	Hosts per-tenant enrichment configuration and rule definitions
Cloud SQL (PostgreSQL)	Persists enrichment results and per-product enrichment history
Artifact Registry	Stores Docker images for the Catalog Analytics and Gemini Enrichment jobs

Per-tenant isolation

Each tenant has its own Cloud Workflow instance, its own enrichment configuration in Cloud Storage, and its own set of BigQuery tables and Cloud SQL tables. Running or updating one tenant's pipeline has no impact on other tenants.

Data Enrichment Flow