Automated Competitor Price Scraper with n8n, Supabase, and RAG

Monitoring competitor pricing at scale is a core requirement for ecommerce teams, pricing analysts, and marketplace operators. This guide documents a production-ready n8n workflow template that ingests scraped product data, converts it into vector embeddings, persists those vectors in Supabase, and exposes them to a Retrieval-Augmented Generation (RAG) agent for context-aware analysis, reporting, and alerts.

The article is organized as a technical reference, with an overview of the data flow, architecture, and node-by-node configuration, followed by setup steps, scaling guidance, and troubleshooting notes.

1. Workflow overview

At a high level, this n8n workflow performs the following tasks:

Accepts scraped competitor product data through a Webhook Trigger (typically from a crawler or third-party scraper).
Splits long product descriptions or HTML content into text chunks optimized for embeddings.
Generates OpenAI embeddings for each chunk.
Persists the embeddings and associated metadata into a Supabase vector table.
Exposes the vector store to a RAG agent via a Supabase Vector Tool for retrieval of relevant context.
Uses an Anthropic chat model to perform analysis, summarization, or commentary on price changes.
Appends structured results to Google Sheets for logging, dashboards, and downstream BI tools.
Sends Slack alerts whenever the RAG agent encounters runtime errors.

The template is designed to be production-ready, but you can easily customize individual nodes for specific pricing strategies, product categories, or internal reporting formats.

2. Architecture and data flow

The workflow can be viewed as a linear pipeline with a retrieval and analysis layer on top:

Ingress: A Webhook node receives POST requests containing product metadata, pricing information, and raw text or HTML content.
Preprocessing: A Text Splitter node segments large content into overlapping chunks to preserve local context.
Vectorization: An Embeddings node calls OpenAI’s text-embedding-3-small model to generate dense vector representations for each chunk.
Storage: A Supabase Insert node writes the vectors and metadata into a Supabase vector table (index name competitor_price_scraper).
Retrieval: A combination of Supabase Query and Vector Tool nodes exposes relevant vector documents to a RAG agent.
Context management: A Window Memory node maintains short-term interaction history for multi-turn analysis sessions.
Reasoning: A Chat Model node connected to Anthropic acts as the LLM backend for the RAG agent.
RAG orchestration: A RAG Agent node combines retrieved context, memory, and instructions to generate structured outputs.
Logging and observability: An Append Sheet node writes results to Google Sheets, while a Slack Alert node reports errors.

Each component is decoupled so you can adjust chunking, embedding models, or storage strategies without rewriting the full pipeline.

3. Node-by-node breakdown

3.1 Webhook Trigger

The Webhook node is the entry point of the workflow.

HTTP method: POST
Example path: /competitor-price-scraper

Configure your crawler, scraping service, or scheduled job to send JSON payloads to this endpoint. A typical payload should include:

{  "product_id": "SKU-12345",  "url": "https://competitor.example/product/123",  "price": 49.99,  "currency": "USD",  "timestamp": "2025-09-01T12:00:00Z",  "raw_text": "Full product title and description..."
}

Required fields depend on your downstream use, but for most price-intelligence scenarios you should provide:

product_id – Your internal SKU or a stable product identifier.
url – Canonical competitor product URL.
price and currency – Current observed price and ISO currency code.
timestamp – ISO 8601 timestamp of the scrape.
raw_text or HTML – Full product title and description, or a cleaned text extraction.

Edge cases:

If raw_text is missing or very short, the workflow can still log price-only data, but embeddings may be less useful.
Ensure the payload size stays within your n8n instance and reverse proxy limits, especially when sending full HTML.

3.2 Text Splitter

The Text Splitter node normalizes large bodies of text into smaller, overlapping segments so embeddings capture local semantics.

Recommended parameters:
chunkSize: 400
chunkOverlap: 40

With this configuration, each chunk contains up to 400 characters, and consecutive chunks overlap by 40 characters. This overlap helps preserve continuity for descriptions that span multiple chunks.

Configuration notes:

For shorter, highly structured content, you can reduce chunkSize to minimize unnecessary splitting.
For very long pages, keep chunkSize moderate to avoid excessive token usage when generating embeddings.

3.3 Embeddings (OpenAI)

The Embeddings node transforms each text chunk into a numeric vector using OpenAI.

Model: text-embedding-3-small

For each chunk, the node:

Sends the chunk text to the OpenAI embeddings endpoint.
Receives a vector representation.
Combines this vector with the original content and metadata for insertion into Supabase.

Metadata best practices:

Include product_id, url, price, currency, and timestamp.
Optionally add competitor_name or other keys used for filtering and deduplication.

Error handling: If embeddings fail due to rate limits or transient network issues, configure retries with exponential backoff in n8n, or wrap this node in error branches that route failures to Slack.

3.4 Supabase Insert & Vector Index

The Supabase Insert node persists each embedding and its metadata into a Supabase table configured for vector search.

Index name: competitor_price_scraper

A minimal schema for the vector table can look like:

id (uuid)
content (text)
embedding (vector)
metadata (jsonb)
created_at (timestamp)

Key points:

Ensure the embedding column dimension matches the OpenAI embedding model dimension.
Store the original chunk text in content for inspection and debugging.
Use metadata to store all identifying fields needed for filtering, deduplication, and analytics.

Deduplication and upserts: You can implement a composite uniqueness strategy in Supabase such as product_id + competitor_name + timestamp or rely on an upsert pattern to avoid storing multiple identical snapshots.

3.5 Supabase Query & Vector Tool

The Supabase Query node retrieves the most similar vectors for a given query embedding. The Vector Tool node then exposes this retrieval capability to the RAG agent.

Typical flow:

The RAG agent or a preceding node constructs a query (for example, “show recent price changes for SKU-12345”).
The workflow generates an embedding for this query or uses the RAG agent’s internal retrieval mechanism.
The Supabase Query node runs a similarity search against competitor_price_scraper and returns the top matches.
The Vector Tool node formats these results as context documents for the RAG agent.

Tuning retrieval quality:

If results look irrelevant, verify that content and metadata are correctly saved and that your vector index is built and used.
Adjust the number of retrieved documents or similarity thresholds in the Supabase Query node as needed.

3.6 Window Memory

The Window Memory node maintains a limited history of recent interactions between the analyst and the RAG agent.

This is particularly useful when:

An analyst asks follow-up questions about a specific product or trend.
You want the agent to maintain conversational continuity without re-sending full context each time.

Keep the window small enough to avoid unnecessary token usage while still capturing the last few turns of the conversation.

3.7 Chat Model (Anthropic)

The Chat Model node is configured to use Anthropic’s API as the language model backend for the RAG agent.

Responsibilities:

Generate instruction-following, analysis-oriented responses.
Interpret retrieved context, metadata, and user instructions.
Produce concise or detailed summaries suitable for logging in Google Sheets.

The model is not called directly by most workflow nodes. Instead, it is wired into the RAG Agent node as the primary LLM.

3.8 RAG Agent

The RAG Agent node orchestrates retrieval and reasoning:

Receives a system or user instruction, for example, “Summarize any significant price changes for this product compared to previous snapshots.”
Uses the Vector Tool to retrieve relevant context from Supabase.
Optionally includes Window Memory to maintain conversational continuity.
Calls the Chat Model node to generate a structured response.
Outputs a status summary that is passed to the Google Sheets node.

Error routing: If the RAG Agent throws an error (for example, due to invalid inputs or LLM issues), the workflow routes the error branch to the Slack Alert node for immediate notification.

3.9 Append Sheet (Google Sheets) & Slack Alert

The Append Sheet node logs structured output to a designated Google Sheet.

Sheet name: Log (or any name you configure)

Typical entries can include:

Product identifiers and URLs.
Current and previous prices, where available.
RAG agent summaries or anomaly flags.
Timestamps and workflow run IDs for traceability.

The Slack Alert node is used for error reporting:

Example channel: #alerts
Payload includes error message and optionally workflow metadata so you can triage quickly.

This pattern ensures that failures in embedding, Supabase operations, or the RAG agent do not go unnoticed.

4. Configuration and credentials

4.1 Required credentials

Before running the template, provision the following credentials in n8n:

OpenAI API key for embeddings.
Supabase project URL and service key for vector storage and queries.
Anthropic API key for the Chat Model node.
Google Sheets OAuth2 credentials for the Append Sheet node.
Slack token for sending alerts.

Store all secrets in n8n’s credential store. Do not expose Supabase service keys to any client-side code.

4.2 Supabase vector table setup

Define a table in Supabase with at least:

id (uuid)
content (text)
embedding (vector)
metadata (jsonb)
created_at (timestamp)

Ensure the vector index (competitor_price_scraper) is created on the embedding column and configured to match the embedding dimension of text-embedding-3-small.

5. Step-by-step setup in n8n

Import the workflow template
Create or reuse an n8n instance and import the provided workflow JSON template for the automated competitor price scraper.
Configure credentials
Add and test:
- OpenAI API key.
- Supabase URL and service key.
- Anthropic API key.
- Google Sheets OAuth2 connection.
- Slack token and default channel.
Prepare Supabase vector table
Create the table with the minimal schema described above and configure the vector index competitor_price_scraper.

Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Comment *
Name *

Email *

Website

Save my name, email, and website in this browser for the next time I comment.

AI-Powered n8n Workflows

🔍 Search 1000s of Templates

✨ Generate with AI

🚀 Deploy Instantly

Try Free Now

Find n8n Templates with AI Search

Competitor Price Scraper with n8n & Supabase