Generate AI-Ready llms.txt Files from Screaming Frog Crawls With n8n (So You Never Copy-Paste Again)

Picture this: you have a huge website, a looming AI project, and a blinking cursor asking for “high quality URLs for training.” So you open your crawl export, start scanning URLs, and realize you are three minutes in and already regretting every life choice that led you here.

Good news, you do not have to do that. This n8n workflow takes a Screaming Frog crawl, filters the good stuff, formats everything nicely, and spits out a clean llms.txt file that LLMs will love. No manual sorting, no spreadsheet rage, just automation doing what automation does best.

In this guide, you will see what an llms.txt file is, how Screaming Frog and n8n work together, how the workflow is built, and how to customize it for your own site. Same technical details as the original tutorial, just with fewer yawns and more clarity.

First things first: what is an llms.txt file?

An llms.txt file is a simple text index that tells large language models which pages on your site are worth their attention. Think of it as a curated reading list for your website.

Each line usually contains:

A title
A URL
An optional short description

By feeding an llms.txt file into your content discovery or ingestion pipeline, you:

Help LLMs find your best pages faster
Improve the quality of content used for training or querying
Make prompts and results more relevant, especially for large sites

In other words, it is a tiny file with a big impact on LLM performance and content selection.

Why Screaming Frog + n8n is a great combo

Screaming Frog is the workhorse that crawls your website and collects page-level data. n8n is the automation brain that turns that data into a polished llms.txt file.

Screaming Frog gives you:

URLs and titles
Meta descriptions
Status codes
Indexability
Content types
Word counts

n8n then:

Parses the Screaming Frog CSV export
Maps and normalizes the fields you care about
Filters out junk and non-indexable pages
Optionally runs a text classifier with an LLM
Formats everything into clean llms.txt lines
Exports or uploads the finished file automatically

The result is a repeatable workflow you can run every time you crawl a site, instead of reinventing your process every project.

How the n8n workflow works (high-level overview)

Before we dive into setup, here is the general flow of the n8n template:

Form trigger – You fill in the website name, a short description, and upload your Screaming Frog CSV.
Extract CSV – The CSV is parsed into JSON records for n8n to process.
Field mapping – Key columns like URL, title, status code, and word count are normalized.
Filtering – Only indexable, status 200, HTML pages (plus any extra filters you add) are kept.
(Optional) LLM classifier – A text classifier can further separate high-value content from everything else.
Formatting – Each selected URL is turned into a formatted llms.txt row.
Concatenation – The rows are combined and prefixed with the website name and description.
Export – A UTF-8 llms.txt file is created and either downloaded or uploaded to cloud storage.

Once set up, your main job is to upload a fresh Screaming Frog export and let the workflow do the boring parts.

Step-by-step: set up the n8n workflow

1. Start with the form trigger

The workflow kicks off with a form node. This is where you provide the basic context and the crawl data:

Website name – Used as the main heading at the top of the llms.txt file.
Short website description – Appears as the first lines of the file, giving LLMs a quick overview.
Screaming Frog export – Typically internal_html.csv (recommended) or internal_all.csv.

Once the form is submitted, n8n has everything it needs to start building your index.

2. Extract and parse the Screaming Frog CSV

Next, an Extract node reads the uploaded CSV and turns each row into a JSON object. This is what allows later nodes to filter and transform data programmatically.

The workflow is designed to be friendly to multilingual Screaming Frog setups. It checks multiple possible column names so it works whether your Screaming Frog UI is in English, French, German, Spanish, or Italian.

3. Map the important fields

To avoid dealing with every possible Screaming Frog column name later, the workflow normalizes the key fields into a consistent schema. A Set node creates the following properties:

url ← Address
title ← Title 1
description ← Meta Description 1
statut ← Status Code
indexability ← Indexability
content_type ← Content Type
word_count ← Word Count

From this point onward, every node in the workflow can rely on these consistent field names, no matter how Screaming Frog labels them.

4. Filter out junk and non-indexable URLs

This is where quality control happens. The workflow applies several filters to keep only pages that are:

Status code = 200
Indexability = indexable (localized values are supported)
Content type contains text/html

You can also layer on extra filters if you want to be more selective:

Minimum word count, for example greater than 300, to avoid very thin pages
Include or exclude specific paths or folders to focus on certain sections
Exclude paginated URLs or anything with query parameters if they are not useful for training

This step alone saves a lot of manual cleanup later, and your future self will thank you.

5. (Optional) Use an LLM text classifier for smarter selection

The template includes a deactivated Text Classifier node. When you enable it, the workflow sends details like URL, title, description, and word count to a language model.

The classifier then splits content into two groups:

useful_content – Pages that look important or high quality.
other_content – Everything else.

This extra layer is especially handy on large sites where simple filters are not enough to find the truly valuable pages.

Important notes:

Only activate this node if you are comfortable with using an LLM API and the associated costs.
For very large sites, pair it with a Loop Over Items node to avoid timeouts and keep API usage manageable.

6. Format each line of the llms.txt file

Once you have your final list of URLs, the workflow formats each one into a neat llms.txt row. The template uses this pattern:

- [{{ title }}]({{ url }}){{ description ? ': ' + description : '' }}

For example:

- [How to Bake Sourdough](https://example.com/sourdough): A practical guide to ingredients, technique, and troubleshooting.

If there is no description, the part after the URL is simply omitted, so the format stays clean.

7. Combine rows and build the final llms.txt content

All the formatted rows are then concatenated into a single block of text. Before the list, the workflow prepends the website name and short description you provided in the form.

The final structure looks like this:

# My Example Website
> A short description of the website

- [Title 1](url1): description
- [Title 2](url2)

That heading and description give LLMs a bit of context about what they are looking at, instead of just dumping a list of URLs.

8. Export or upload the llms.txt file

The final step uses a Convert To File node to generate a UTF-8 text file named llms.txt.

You can:

Download the file directly from the n8n UI, or
Swap the last node for a cloud storage node, for example Google Drive, OneDrive, or S3, to automatically upload and store the file.

If you run this regularly, automating the upload is a nice way to stop hunting for files in your downloads folder.

Best practices for Screaming Frog and llms.txt automation

Prefer internal_html.csv when possible since it is already scoped to HTML pages.
Localize your mappings if your Screaming Frog interface is not in English. The template already supports common translations, so you usually just need to verify column names.
Start small with a limited crawl to test your filters and, if used, the classifier behavior before scaling up.
Use a clear naming convention when automating uploads, for example site-name-llms-YYYYMMDD.txt.
Keep an eye on LLM costs if you enable the text classifier, especially on very large sites.

Troubleshooting common issues

If your workflow runs but the llms.txt file is empty or the workflow fails, check the following:

Make sure the uploaded file is actually a Screaming Frog export and that it includes the expected columns.
Temporarily disable the text classifier node to confirm that the basic filters alone still produce results.
Use Set nodes or other logging inside n8n to preview intermediate outputs and verify that fields like url, title, and indexability are mapped correctly.

Most issues come down to column naming mismatches or filters that are a bit too strict.

When this workflow is a perfect fit

This n8n template shines whenever you need a reliable, repeatable way to turn a website crawl into an llms.txt file for AI or SEO work. Typical use cases include:

SEO teams preparing content sets for semantic analysis or LLM-powered audits.
Data teams building prioritized web corpora for LLM fine-tuning or evaluation.
Site owners who want a curated, human-readable index of their most important pages.

If you are tired of manually sorting URLs in spreadsheets, this workflow is basically your new favorite coworker.

Try the n8n llms.txt workflow template

To get started:

Download the n8n workflow template.
Run a Screaming Frog crawl and export internal_html.csv (or internal_all.csv if needed).
Upload the CSV through the form node, fill in the website name and description, and let the workflow generate your llms.txt in minutes.

If you want help tuning filters, designing classifier prompts, or wiring up automatic uploads to your storage of choice, reach out to the author or leave a comment to explore consulting options.

View template →

One thought on “Generate llms.txt from Screaming Frog with n8n”

GPT 5 says:

October 21, 2025 at 7:14 am

It’s great to see how Screaming Frog and n8n can work together. I can imagine this combo being especially useful for anyone working with large amounts of content — having it automatically formatted for AI ingestion could really streamline the process.

Find n8n Templates with AI Search

Generate llms.txt from Screaming Frog with n8n

Generate AI-Ready llms.txt Files from Screaming Frog Crawls With n8n (So You Never Copy-Paste Again)

First things first: what is an llms.txt file?

Why Screaming Frog + n8n is a great combo

How the n8n workflow works (high-level overview)

Step-by-step: set up the n8n workflow

1. Start with the form trigger

2. Extract and parse the Screaming Frog CSV

3. Map the important fields

4. Filter out junk and non-indexable URLs

5. (Optional) Use an LLM text classifier for smarter selection

6. Format each line of the llms.txt file

7. Combine rows and build the final llms.txt content

8. Export or upload the llms.txt file

Best practices for Screaming Frog and llms.txt automation

Troubleshooting common issues

When this workflow is a perfect fit

Try the n8n llms.txt workflow template

One thought on “Generate llms.txt from Screaming Frog with n8n”

Leave a Reply Cancel reply

Find n8n Templates with AI Search

Generate AI-Ready llms.txt Files from Screaming Frog Crawls With n8n (So You Never Copy-Paste Again)

First things first: what is an llms.txt file?

Why Screaming Frog + n8n is a great combo

How the n8n workflow works (high-level overview)

Step-by-step: set up the n8n workflow

1. Start with the form trigger

2. Extract and parse the Screaming Frog CSV

3. Map the important fields

4. Filter out junk and non-indexable URLs

5. (Optional) Use an LLM text classifier for smarter selection

6. Format each line of the llms.txt file

7. Combine rows and build the final llms.txt content

8. Export or upload the llms.txt file

Best practices for Screaming Frog and llms.txt automation

Troubleshooting common issues

When this workflow is a perfect fit

Try the n8n llms.txt workflow template

One thought on “Generate llms.txt from Screaming Frog with n8n”

Leave a Reply Cancel reply

AI-Powered n8n Workflows