Automated Customer Insights from Trustpilot with n8n

Trustpilot reviews contain rich qualitative data about customer experience, product quality, and recurring issues. Manually reading and tagging hundreds or thousands of reviews is slow, inconsistent, and difficult to scale. This reference guide documents a reusable n8n workflow template that automates the full pipeline:

Scrape Trustpilot review pages for a specific company
Extract structured review fields (author, date, rating, text, etc.)
Convert review text to OpenAI embeddings
Store vectors and metadata in Qdrant (vector database)
Cluster semantically similar reviews with K-means
Use an LLM to generate concise insights, sentiment labels, and improvement suggestions
Export the final insights to Google Sheets or downstream dashboards

The result is a repeatable, scalable workflow for automated customer insights that can be re-used across multiple brands or products with minimal configuration.

1. Workflow Overview

The n8n template implements an end-to-end pipeline with eight main phases:

Initialization – Set the target company identifier (Trustpilot URL slug).
Scraping – Use an HTML-capable node to fetch Trustpilot review pages.
Parsing – Extract structured fields such as author, date, rating, title, text, URL, and country.
Embedding & Storage – Generate embeddings with OpenAI and insert them into a Qdrant collection.
Insights Trigger – Invoke a sub-workflow with a specified date range for analysis.
Clustering – Retrieve embeddings from Qdrant and run a K-means clustering algorithm.
Cluster Aggregation – Fetch the original reviews for each cluster.
LLM Insights & Export – Use an LLM to generate insights, sentiment, and improvement suggestions, then export to Google Sheets.

Each step is implemented as one or more n8n nodes, connected in sequence to form a reproducible pipeline. The workflow can be run on demand or scheduled, depending on your monitoring needs.

2. Architecture & Data Flow

The workflow is orchestrated entirely within n8n, using external services for embeddings, vector storage, and spreadsheet export.

2.1 High-level components

n8n – Core automation engine that coordinates triggers, scraping, data transformation, and downstream integrations.
Trustpilot – Source of customer reviews, accessed via HTML scraping.
OpenAI Embeddings – Converts review text into dense semantic vectors for clustering and similarity analysis.
Qdrant – Vector database that stores embeddings plus rich metadata for filtered queries.
Clustering logic (K-means) – Groups similar reviews into a small number of coherent clusters.
LLM (e.g., OpenAI gpt-4o-mini) – Consumes grouped reviews and generates insights, sentiment labels, and improvement recommendations.
Google Sheets – Destination for exporting the final insights in a tabular format.

2.2 Data pipeline sequence

Input – The workflow starts with a companyId that corresponds to the Trustpilot slug (for example, www.freddiesflowers.com).
Scraping & Parsing – HTML content is fetched from Trustpilot and parsed into structured JSON objects, one per review.
Embedding – The review body (and optionally title) is passed to an OpenAI embeddings model such as text-embedding-3-small.
Vector storage – The resulting embeddings, along with review metadata, are stored in a Qdrant collection (for example, trustpilot_reviews).
Query & clustering – Based on a date range or other filters, a subset of points is retrieved from Qdrant and clustered via K-means.
Cluster aggregation – For each cluster, all associated reviews are grouped to form a coherent input set for the LLM.
Insight generation – The LLM processes each cluster and outputs a structured JSON with insight text, sentiment label, and suggested improvements.
Export – The results are combined with cluster metadata and written to Google Sheets for reporting or further analysis.

3. Node-by-Node Breakdown

The exact node naming may vary in your instance, but the logical responsibilities are consistent across implementations.

3.1 Initialization Nodes

Start / Manual Trigger
Used to kick off the workflow. In production you may also attach a Cron node for scheduled runs.
Set Company Parameters
A Set or Function node defines:
- companyId – Trustpilot slug, for example www.freddiesflowers.com.
- Optional additional parameters such as page range, maximum reviews, or date filters (if you extend the template).

3.2 Scraping & Parsing Nodes

HTTP Request / HTML Node (Trustpilot Scraper)
Fetches HTML for one or more Trustpilot review pages. Typical responsibilities:
- Construct Trustpilot URLs using companyId and page indices.
- Handle pagination by iterating over pages until a configured limit.
- Respect rate limits and delays between requests.
HTML Parsing / Code Node (Review Extractor)
Parses the HTML response and extracts structured review data. The node outputs one item per review with fields such as:
- author
- review_date
- rating (typically numeric)
- title
- text (main review body)
- review_url
- country
- company_id (propagated from companyId)
This normalization step is crucial for reliable downstream embedding and metadata-based filtering.

3.3 Embedding & Qdrant Storage Nodes

OpenAI Embeddings Node
Uses your OpenAI credentials to create embeddings for each review. Typical configuration:
- Model: text-embedding-3-small (or another embedding model you select).
- Input field: review text, optionally concatenated with the title.
The node outputs a vector representation for each review item.
Qdrant Insert Node
Writes the embedding vectors and metadata into Qdrant. Configuration details:
- Collection name: for example trustpilot_reviews.
- Vector field: the embedding output from the previous node.
- Payload / metadata: fields such as:
  - company_id
  - review_date
  - author
  - rating
  - country
  - review_url
  - raw or cleaned text
Optionally, an earlier node can clear existing points for a specific company or date range before inserting new ones.

3.4 Insights Sub-workflow Trigger

Workflow Trigger / Sub-workflow Node
A separate sub-workflow is responsible for generating insights from stored vectors. It typically accepts:
- companyId
- start_date and end_date (date range for analysis)
These parameters are passed to Qdrant queries to scope which reviews are included in clustering.

3.5 Qdrant Retrieval & Clustering Nodes

Qdrant Search / Retrieve Node
Queries Qdrant for embeddings matching the given filters. Common filters:
- company_id == companyId
- review_date within the provided range
The node returns vectors and associated metadata for all matching reviews.
Clustering Node (K-means via Code / Python)
A Code or Python node runs a K-means clustering algorithm over the retrieved embeddings. Implementation details:
- Configured for up to 5 clusters in the reference template.
- Clusters with fewer than 3 reviews are filtered out to reduce noise.
Output items typically include:
- Cluster identifier (for example, cluster_id)
- Associated review IDs or indices
- Optional cluster centroid vector (if you persist it)

3.6 Cluster Aggregation Nodes

Grouping / Code Node (Cluster Review Aggregator)
For each cluster, this node:
- Collects the full review texts and metadata belonging to that cluster.
- Prepares a structured payload for the LLM, often as an array of reviews with rating, date, and text.
This ensures the LLM receives enough context to identify recurring themes rather than isolated comments.

3.7 LLM Insight Generation Nodes

LLM Node (Customer Insights Agent)
Uses a model such as OpenAI gpt-4o-mini to analyze each cluster. The node is typically configured with:
- A system prompt describing the role: for example, “You are a customer insights analyst summarizing Trustpilot reviews.”
- Instructions to output a JSON object with:
  - Insight – a short paragraph summarizing the theme of the cluster.
  - Sentiment – one of strongly negative, negative, neutral, positive, strongly positive.
  - Suggested Improvements – concise, tactical recommendations based on the feedback.
The node returns structured data for each cluster that can be easily consumed by downstream tools.

3.8 Export to Google Sheets Nodes

Google Sheets Node
Writes the LLM output and metadata into a spreadsheet. Typical columns include:
- Cluster ID
- Insight
- Sentiment
- Suggested Improvements
- Company ID
- Date range used for the analysis
- Optional aggregate metrics such as average rating per cluster
This makes it easy to share insights with non-technical stakeholders and integrate with BI dashboards.

4. Setup & Configuration Checklist

Before running the template, complete the following configuration steps.

4.1 n8n Environment

Deploy an n8n instance (cloud or self-hosted).
Ensure outbound network access to:
- Trustpilot (for scraping)
- OpenAI API
- Your Qdrant instance
- Google APIs for Sheets

4.2 Credentials

OpenAI
- Add OpenAI credentials in n8n.
- Select an embeddings model, for example text-embedding-3-small.
Qdrant
- Provision a Qdrant instance (cloud or self-hosted).
- Create a collection, for example trustpilot_reviews, with vector size matching your embedding model.
- Configure Qdrant credentials in n8n.
Google Sheets
- Connect a Google account in n8n.
- Configure the Google Sheets node with:
  - Target spreadsheet
  - Worksheet name
  - Column mapping for the exported fields

4.3 Workflow Parameters

Set companyId to the Trustpilot slug of the company to analyze, for example:
- www.freddiesflowers.com
Optionally configure:
- Number of pages or maximum reviews to scrape.
- Date range for the insights sub-workflow.
- Whether to clear existing Qdrant points for this company before inserting new ones.

After configuration, run the workflow once to validate scraping, embeddings, and Qdrant insertion, then schedule it or trigger it as needed.

5. Best Practices & Implementation Notes

5.1 Rate Limiting & Polite Scraping

When scraping Trustpilot, follow good scraping hygiene:

Respect robots rules and site usage policies.
Use n8n’s pagination features to control the number of requests per run.
Introduce delays between page requests to avoid overloading the site.
Consider caching or deduplicating already-processed reviews to minimize repeated scraping.

5.2 Metadata Strategy in Qdrant

Rich metadata significantly improves query flexibility and insight quality. At minimum, store the following fields as Qdrant payload:

company_id
review_date
author
rating
country
review_url
text (or a cleaned version)

This enables:

Date-range scoped analyses.
Filtering by rating or geography.
Linking back to the original review for manual inspection.

5.3 Cluster Size & Validation

The reference workflow uses K-means with up to 5 clusters, then filters out clusters containing fewer than 3 reviews. Practical guidelines:

For small datasets, too many clusters lead to noisy, low-signal groups.
For larger datasets, increasing the cluster count can surface more granular themes.

Find n8n Templates with AI Search

Automated Customer Insights from Trustpilot with n8n

Automated Customer Insights from Trustpilot with n8n

1. Workflow Overview

2. Architecture & Data Flow

2.1 High-level components

2.2 Data pipeline sequence

3. Node-by-Node Breakdown

3.1 Initialization Nodes

3.2 Scraping & Parsing Nodes

3.3 Embedding & Qdrant Storage Nodes

3.4 Insights Sub-workflow Trigger

3.5 Qdrant Retrieval & Clustering Nodes

3.6 Cluster Aggregation Nodes

3.7 LLM Insight Generation Nodes

3.8 Export to Google Sheets Nodes

4. Setup & Configuration Checklist

4.1 n8n Environment

4.2 Credentials

4.3 Workflow Parameters

5. Best Practices & Implementation Notes

5.1 Rate Limiting & Polite Scraping

5.2 Metadata Strategy in Qdrant

5.3 Cluster Size & Validation

Leave a Reply Cancel reply

Find n8n Templates with AI Search

Automated Customer Insights from Trustpilot with n8n

1. Workflow Overview

2. Architecture & Data Flow

2.1 High-level components

2.2 Data pipeline sequence

3. Node-by-Node Breakdown

3.1 Initialization Nodes

3.2 Scraping & Parsing Nodes

3.3 Embedding & Qdrant Storage Nodes

3.4 Insights Sub-workflow Trigger

3.5 Qdrant Retrieval & Clustering Nodes

3.6 Cluster Aggregation Nodes

3.7 LLM Insight Generation Nodes

3.8 Export to Google Sheets Nodes

4. Setup & Configuration Checklist

4.1 n8n Environment

4.2 Credentials

4.3 Workflow Parameters

5. Best Practices & Implementation Notes

5.1 Rate Limiting & Polite Scraping

5.2 Metadata Strategy in Qdrant

5.3 Cluster Size & Validation

Leave a Reply Cancel reply

AI-Powered n8n Workflows