Scrape Trustpilot Reviews to Google Sheets with n8n

Automating the collection of Trustpilot reviews into Google Sheets gives you a reliable data source for monitoring customer sentiment, tracking issues, and feeding review data into downstream systems. This reference-style guide documents a ready-to-use n8n workflow that scrapes Trustpilot reviews, parses the embedded JSON, and appends or updates rows in Google Sheets with deduplication based on review_id.

1. Workflow overview

This n8n template is designed as a scheduled or manually triggered pipeline that:

Requests paginated Trustpilot review pages for a specific company slug (company_id).
Extracts and parses the __NEXT_DATA__ JSON structure that contains the review data.
Normalizes each review into a consistent schema.
Writes reviews into one or two Google Sheets tabs using appendOrUpdate with review_id as the unique key.

The workflow supports two output formats:

A general archival sheet with standard review fields.
A HelpfulCrowd-compatible sheet for direct import into HelpfulCrowd.

2. Architecture and data flow

The workflow is organized as a linear pipeline of nodes, from trigger to Sheets output:

Schedule Trigger / Manual Trigger – entry point that controls when the workflow runs.
Global (Set) – defines global configuration values, including company_id and max_page.
Get reviews (HTTP Request) – fetches Trustpilot review pages with pagination.
Parse reviews (Code) – parses the HTML response, extracts __NEXT_DATA__, and returns an array of review objects.
Split Out – converts the reviews array into individual items (one item per review).
General edits (Set) – maps and normalizes fields for a generic Google Sheet.
HelpfulCrowd edits (Set) – maps and normalizes fields to match HelpfulCrowd’s import schema.
General sheet (Google Sheets) – writes to the general archival sheet using appendOrUpdate.
HelpfulCrowd Sheets (Google Sheets) – writes to the HelpfulCrowd-formatted sheet using appendOrUpdate.

Logical data flow:

Trigger node starts the workflow.
Global configuration is attached to each item and used downstream by the HTTP Request node.
HTTP Request node returns one item per page, each containing HTML content.
Code node transforms each HTML payload into an array of review objects.
Split Out node flattens that array into separate n8n items.
Set nodes project each review into the exact column layout expected by each Google Sheet.
Google Sheets nodes write or update rows based on review_id.

3. Prerequisites

An n8n instance (Cloud or self-hosted) with access to the public internet.
Google Sheets credentials configured in n8n using OAuth.
Permission to access and scrape the specific Trustpilot pages (see legal section below).
Basic understanding of n8n nodes, credentials, parameters, and how to modify node configurations.

4. Node-by-node breakdown

4.1 Trigger: Schedule Trigger / Manual Trigger

Purpose: Control when the workflow runs.

Manual Trigger is typically used during initial setup and testing.
Schedule Trigger is used for production runs, for example every hour or once per day.

Key considerations:

Use the Schedule Trigger in production to continuously fetch new reviews.
Choose an interval that respects Trustpilot rate limits and your own operational needs.

4.2 Global configuration: Global (Set)

Node type: Set

Purpose: Provide reusable configuration values to downstream nodes.

Typical fields set in this node:

company_id – the company slug used in Trustpilot URLs. Example: for https://www.trustpilot.com/review/n8n.io, the slug is n8n.io.
max_page – maximum number of pages to fetch during pagination.

These values are referenced by the HTTP Request node via expressions like {{ $json.company_id }} and {{ $json.max_page }}.

4.3 HTTP Request: Get reviews

Node type: HTTP Request

Purpose: Fetch HTML pages containing Trustpilot reviews with built-in pagination.

Base URL:

https://trustpilot.com/review/{{ $json.company_id }}

Pagination configuration (core parameters):

Query or path parameter for page The page parameter increments using: {{ $pageCount + 1 }}
maxRequests Set to: {{ $json.max_page }} This caps the total number of page requests, regardless of what the site returns.
requestInterval (ms) A delay between requests, for example 5000 (5 seconds), to reduce rate pressure and lower the chance of being blocked.
paginationCompleteWhen Set to receiveSpecificStatusCodes, which means pagination stops when a configured status code is received.
statusCodesWhenComplete Set to 404, so the workflow stops when it hits a non-existent page.

Edge cases and behavior:

If Trustpilot returns a 404 before reaching max_page, the workflow stops early, which is expected and prevents unnecessary requests.
If Trustpilot changes its behavior and returns a different status code for “no more pages”, the pagination logic may need to be updated.
If the site uses redirects or HTML shims instead of 404, the current paginationCompleteWhen configuration might not trigger as expected and may require adjustment.

Optional headers:

To reduce the risk of being treated as a bot, consider setting a realistic User-Agent header in this node, especially if you encounter missing content or captchas.

4.4 Code node: Parse reviews

Node type: Code

Purpose: Parse the HTML from Trustpilot, locate the __NEXT_DATA__ script tag, extract the embedded JSON, and return an array of review objects.

The Code node uses cheerio to work with the HTML and to locate the script tag:

// Key parsing approach (simplified)
const cheerio = require('cheerio');
const $ = cheerio.load(content);
const scriptTag = $('#__NEXT_DATA__');
const reviewsRaw = JSON.parse(scriptTag.html());
return reviewsRaw.props.pageProps.reviews || [];

Core logic:

The HTML response from the HTTP Request node is loaded into cheerio.
The script tag with id __NEXT_DATA__ is selected.
The script content is parsed as JSON, which contains the page’s data, including reviews.
The node returns reviewsRaw.props.pageProps.reviews or an empty array if that path is missing.

Error handling:

The template includes a try/catch block so that malformed HTML, missing script tags, or invalid JSON do not crash the entire workflow.
On error, the node should fail gracefully and can return an empty set of reviews for that page, depending on how the code is written in the template.

Data extracted per review:

Date
Author
Review text / body
Review title / heading
Rating
Location
review_id (used as a unique identifier in Sheets)

Environment note:

n8n’s Code node supports require('cheerio') in environments where external modules are available.
In some self-hosted n8n setups, you must ensure that cheerio is installed and accessible to the runtime, otherwise the Code node will fail when calling require('cheerio').

4.5 Split Out: expand review array

Node type: Item splitting node (for example Item Lists / “Split Out”)

Purpose: Convert the array of reviews returned by the Code node into one item per review, which is the standard pattern for processing each review independently in downstream nodes.

Behavior:

Input: one item containing an array of review objects.
Output: multiple items, each corresponding to a single review object.

This step is required so that each Set node and each Google Sheets node operates on individual reviews.

4.6 Field mapping: General edits (Set)

Node type: Set

Purpose: Map raw review fields into a normalized schema suitable for a general archival Google Sheet.

Typical mapped fields:

Date – standardized review date.
Author – reviewer name or identifier.
Body – full review text.
Heading – review title.
Rating – numeric rating value.
Location – reviewer location if available.
review_id – unique identifier used for deduplication in Sheets.

The exact field names should match the column headers in your general Google Sheet tab.

4.7 Field mapping: HelpfulCrowd edits (Set)

Node type: Set

Purpose: Transform the same review objects into the schema expected by HelpfulCrowd’s import format.

Typical mapped fields:

product_id
rating
title
feedback – review body text.
customer_name
status – for example, published or pending (depending on how you map it).
review_date
verified – indicates whether the review is verified, if this information is available or inferred.
review_id – still used as the unique key for Sheets operations.

Ensure that these field names align exactly with your HelpfulCrowd import template and with the column names in the corresponding Google Sheet tab.

4.8 Google Sheets: General sheet

Node type: Google Sheets

Operation: appendOrUpdate

Purpose: Write normalized reviews into a general archival sheet while avoiding duplicates.

Key parameters:

Authentication Use your configured Google Sheets OAuth credentials.
Spreadsheet / Sheet Point to the cloned sample spreadsheet and select the general tab.
Operation Set to appendOrUpdate so that existing rows are updated when a matching key is found.
matchingColumns Set to review_id. This is critical for deduplication.

Behavior:

If a row with the same review_id already exists, that row is updated with the latest data.
If no matching review_id is found, a new row is appended.
This allows the workflow to be safely re-run without creating duplicate entries for the same review.

4.9 Google Sheets: HelpfulCrowd Sheets

Node type: Google Sheets

Operation: appendOrUpdate

Purpose: Write reviews into a second sheet that matches HelpfulCrowd’s import schema.

Key parameters:

Authentication Use the same Google Sheets OAuth credentials as the general sheet node.
Spreadsheet / Sheet Select the HelpfulCrowd-formatted tab in the cloned spreadsheet.
Operation appendOrUpdate, identical behavior to the general sheet node.
matchingColumns Set to review_id to maintain consistency and avoid duplicates.

5. Step-by-step configuration

5.1 Clone the sample spreadsheet

The template assumes you are using a spreadsheet that contains two tabs:

A general review sheet.
A HelpfulCrowd-formatted sheet.

Steps:

Clone the provided sample spreadsheet into your own Google account.
In n8n, open each Google Sheets node and:
- Select your Google Sheets OAuth credentials.
- Point the node to the cloned spreadsheet and the correct tab.

5.2 Configure global variables

Open the Global (Set) node and set the following fields:

company_id Set to the Trustpilot company slug, for example: n8n.io.
max_page Set to the maximum number of pages to fetch, for example 10. This value is referenced by the HTTP Request node for pagination.

5.3 Verify HTTP Request pagination

Open the Get reviews node and confirm the following:

URL is set to: https://trustpilot.com/review/{{ $json.company_id }}
Pagination is enabled:
- Page parameter increments with {{ $pageCount + 1 }}.
- maxRequests is {{ $json.max_page }}.
- requestInterval is configured (for example 5000 ms).
- paginationCompleteWhen is receiveSpecificStatusCodes with statusCodesWhenComplete = 404.

5.4 Validate the Code node behavior

Open the Parse reviews Code node and

Find n8n Templates with AI Search

Scrape Trustpilot Reviews to Google Sheets

Scrape Trustpilot Reviews to Google Sheets with n8n

1. Workflow overview

2. Architecture and data flow

3. Prerequisites

4. Node-by-node breakdown

4.1 Trigger: Schedule Trigger / Manual Trigger

4.2 Global configuration: Global (Set)

4.3 HTTP Request: Get reviews

4.4 Code node: Parse reviews

4.5 Split Out: expand review array

4.6 Field mapping: General edits (Set)

4.7 Field mapping: HelpfulCrowd edits (Set)

4.8 Google Sheets: General sheet

4.9 Google Sheets: HelpfulCrowd Sheets

5. Step-by-step configuration

5.1 Clone the sample spreadsheet

5.2 Configure global variables

5.3 Verify HTTP Request pagination

5.4 Validate the Code node behavior

Leave a Reply Cancel reply

Find n8n Templates with AI Search

Scrape Trustpilot Reviews to Google Sheets with n8n

1. Workflow overview

2. Architecture and data flow

3. Prerequisites

4. Node-by-node breakdown

4.1 Trigger: Schedule Trigger / Manual Trigger

4.2 Global configuration: Global (Set)

4.3 HTTP Request: Get reviews

4.4 Code node: Parse reviews

4.5 Split Out: expand review array

4.6 Field mapping: General edits (Set)

4.7 Field mapping: HelpfulCrowd edits (Set)

4.8 Google Sheets: General sheet

4.9 Google Sheets: HelpfulCrowd Sheets

5. Step-by-step configuration

5.1 Clone the sample spreadsheet

5.2 Configure global variables

5.3 Verify HTTP Request pagination

5.4 Validate the Code node behavior

Leave a Reply Cancel reply

AI-Powered n8n Workflows