Automated Customer Insights from Trustpilot with n8n
Trustpilot reviews contain rich qualitative data about customer experience, product quality, and recurring issues. Manually reading and tagging hundreds or thousands of reviews is slow, inconsistent, and difficult to scale. This reference guide documents a reusable n8n workflow template that automates the full pipeline:
- Scrape Trustpilot review pages for a specific company
- Extract structured review fields (author, date, rating, text, etc.)
- Convert review text to OpenAI embeddings
- Store vectors and metadata in Qdrant (vector database)
- Cluster semantically similar reviews with K-means
- Use an LLM to generate concise insights, sentiment labels, and improvement suggestions
- Export the final insights to Google Sheets or downstream dashboards
The result is a repeatable, scalable workflow for automated customer insights that can be re-used across multiple brands or products with minimal configuration.
1. Workflow Overview
The n8n template implements an end-to-end pipeline with eight main phases:
- Initialization – Set the target company identifier (Trustpilot URL slug).
- Scraping – Use an HTML-capable node to fetch Trustpilot review pages.
- Parsing – Extract structured fields such as author, date, rating, title, text, URL, and country.
- Embedding & Storage – Generate embeddings with OpenAI and insert them into a Qdrant collection.
- Insights Trigger – Invoke a sub-workflow with a specified date range for analysis.
- Clustering – Retrieve embeddings from Qdrant and run a K-means clustering algorithm.
- Cluster Aggregation – Fetch the original reviews for each cluster.
- LLM Insights & Export – Use an LLM to generate insights, sentiment, and improvement suggestions, then export to Google Sheets.
Each step is implemented as one or more n8n nodes, connected in sequence to form a reproducible pipeline. The workflow can be run on demand or scheduled, depending on your monitoring needs.
2. Architecture & Data Flow
The workflow is orchestrated entirely within n8n, using external services for embeddings, vector storage, and spreadsheet export.
2.1 High-level components
- n8n – Core automation engine that coordinates triggers, scraping, data transformation, and downstream integrations.
- Trustpilot – Source of customer reviews, accessed via HTML scraping.
- OpenAI Embeddings – Converts review text into dense semantic vectors for clustering and similarity analysis.
- Qdrant – Vector database that stores embeddings plus rich metadata for filtered queries.
- Clustering logic (K-means) – Groups similar reviews into a small number of coherent clusters.
- LLM (e.g., OpenAI gpt-4o-mini) – Consumes grouped reviews and generates insights, sentiment labels, and improvement recommendations.
- Google Sheets – Destination for exporting the final insights in a tabular format.
2.2 Data pipeline sequence
- Input – The workflow starts with a
companyIdthat corresponds to the Trustpilot slug (for example,www.freddiesflowers.com). - Scraping & Parsing – HTML content is fetched from Trustpilot and parsed into structured JSON objects, one per review.
- Embedding – The review body (and optionally title) is passed to an OpenAI embeddings model such as
text-embedding-3-small. - Vector storage – The resulting embeddings, along with review metadata, are stored in a Qdrant collection (for example,
trustpilot_reviews). - Query & clustering – Based on a date range or other filters, a subset of points is retrieved from Qdrant and clustered via K-means.
- Cluster aggregation – For each cluster, all associated reviews are grouped to form a coherent input set for the LLM.
- Insight generation – The LLM processes each cluster and outputs a structured JSON with insight text, sentiment label, and suggested improvements.
- Export – The results are combined with cluster metadata and written to Google Sheets for reporting or further analysis.
3. Node-by-Node Breakdown
The exact node naming may vary in your instance, but the logical responsibilities are consistent across implementations.
3.1 Initialization Nodes
- Start / Manual Trigger
Used to kick off the workflow. In production you may also attach a Cron node for scheduled runs. - Set Company Parameters
A Set or Function node defines:companyId– Trustpilot slug, for examplewww.freddiesflowers.com.- Optional additional parameters such as page range, maximum reviews, or date filters (if you extend the template).
3.2 Scraping & Parsing Nodes
- HTTP Request / HTML Node (Trustpilot Scraper)
Fetches HTML for one or more Trustpilot review pages. Typical responsibilities:- Construct Trustpilot URLs using
companyIdand page indices. - Handle pagination by iterating over pages until a configured limit.
- Respect rate limits and delays between requests.
- Construct Trustpilot URLs using
- HTML Parsing / Code Node (Review Extractor)
Parses the HTML response and extracts structured review data. The node outputs one item per review with fields such as:authorreview_daterating(typically numeric)titletext(main review body)review_urlcountrycompany_id(propagated fromcompanyId)
This normalization step is crucial for reliable downstream embedding and metadata-based filtering.
3.3 Embedding & Qdrant Storage Nodes
- OpenAI Embeddings Node
Uses your OpenAI credentials to create embeddings for each review. Typical configuration:- Model:
text-embedding-3-small(or another embedding model you select). - Input field: review text, optionally concatenated with the title.
The node outputs a vector representation for each review item.
- Model:
- Qdrant Insert Node
Writes the embedding vectors and metadata into Qdrant. Configuration details:- Collection name: for example
trustpilot_reviews. - Vector field: the embedding output from the previous node.
- Payload / metadata: fields such as:
company_idreview_dateauthorratingcountryreview_url- raw or cleaned
text
Optionally, an earlier node can clear existing points for a specific company or date range before inserting new ones.
- Collection name: for example
3.4 Insights Sub-workflow Trigger
- Workflow Trigger / Sub-workflow Node
A separate sub-workflow is responsible for generating insights from stored vectors. It typically accepts:companyIdstart_dateandend_date(date range for analysis)
These parameters are passed to Qdrant queries to scope which reviews are included in clustering.
3.5 Qdrant Retrieval & Clustering Nodes
- Qdrant Search / Retrieve Node
Queries Qdrant for embeddings matching the given filters. Common filters:company_id == companyIdreview_datewithin the provided range
The node returns vectors and associated metadata for all matching reviews.
- Clustering Node (K-means via Code / Python)
A Code or Python node runs a K-means clustering algorithm over the retrieved embeddings. Implementation details:- Configured for up to 5 clusters in the reference template.
- Clusters with fewer than 3 reviews are filtered out to reduce noise.
Output items typically include:
- Cluster identifier (for example,
cluster_id) - Associated review IDs or indices
- Optional cluster centroid vector (if you persist it)
3.6 Cluster Aggregation Nodes
- Grouping / Code Node (Cluster Review Aggregator)
For each cluster, this node:- Collects the full review texts and metadata belonging to that cluster.
- Prepares a structured payload for the LLM, often as an array of reviews with rating, date, and text.
This ensures the LLM receives enough context to identify recurring themes rather than isolated comments.
3.7 LLM Insight Generation Nodes
- LLM Node (Customer Insights Agent)
Uses a model such as OpenAI gpt-4o-mini to analyze each cluster. The node is typically configured with:- A system prompt describing the role: for example, “You are a customer insights analyst summarizing Trustpilot reviews.”
- Instructions to output a JSON object with:
Insight– a short paragraph summarizing the theme of the cluster.Sentiment– one of strongly negative, negative, neutral, positive, strongly positive.Suggested Improvements– concise, tactical recommendations based on the feedback.
The node returns structured data for each cluster that can be easily consumed by downstream tools.
3.8 Export to Google Sheets Nodes
- Google Sheets Node
Writes the LLM output and metadata into a spreadsheet. Typical columns include:- Cluster ID
- Insight
- Sentiment
- Suggested Improvements
- Company ID
- Date range used for the analysis
- Optional aggregate metrics such as average rating per cluster
This makes it easy to share insights with non-technical stakeholders and integrate with BI dashboards.
4. Setup & Configuration Checklist
Before running the template, complete the following configuration steps.
4.1 n8n Environment
- Deploy an n8n instance (cloud or self-hosted).
- Ensure outbound network access to:
- Trustpilot (for scraping)
- OpenAI API
- Your Qdrant instance
- Google APIs for Sheets
4.2 Credentials
- OpenAI
- Add OpenAI credentials in n8n.
- Select an embeddings model, for example
text-embedding-3-small.
- Qdrant
- Provision a Qdrant instance (cloud or self-hosted).
- Create a collection, for example
trustpilot_reviews, with vector size matching your embedding model. - Configure Qdrant credentials in n8n.
- Google Sheets
- Connect a Google account in n8n.
- Configure the Google Sheets node with:
- Target spreadsheet
- Worksheet name
- Column mapping for the exported fields
4.3 Workflow Parameters
- Set
companyIdto the Trustpilot slug of the company to analyze, for example:www.freddiesflowers.com
- Optionally configure:
- Number of pages or maximum reviews to scrape.
- Date range for the insights sub-workflow.
- Whether to clear existing Qdrant points for this company before inserting new ones.
After configuration, run the workflow once to validate scraping, embeddings, and Qdrant insertion, then schedule it or trigger it as needed.
5. Best Practices & Implementation Notes
5.1 Rate Limiting & Polite Scraping
When scraping Trustpilot, follow good scraping hygiene:
- Respect robots rules and site usage policies.
- Use n8n’s pagination features to control the number of requests per run.
- Introduce delays between page requests to avoid overloading the site.
- Consider caching or deduplicating already-processed reviews to minimize repeated scraping.
5.2 Metadata Strategy in Qdrant
Rich metadata significantly improves query flexibility and insight quality. At minimum, store the following fields as Qdrant payload:
company_idreview_dateauthorratingcountryreview_urltext(or a cleaned version)
This enables:
- Date-range scoped analyses.
- Filtering by rating or geography.
- Linking back to the original review for manual inspection.
5.3 Cluster Size & Validation
The reference workflow uses K-means with up to 5 clusters, then filters out clusters containing fewer than 3 reviews. Practical guidelines:
- For small datasets, too many clusters lead to noisy, low-signal groups.
- For larger datasets, increasing the cluster count can surface more granular themes.
<
