Generate AI-Ready llms.txt Files from Screaming Frog Crawls With n8n (So You Never Copy-Paste Again)
Picture this: you have a huge website, a looming AI project, and a blinking cursor asking for “high quality URLs for training.” So you open your crawl export, start scanning URLs, and realize you are three minutes in and already regretting every life choice that led you here.
Good news, you do not have to do that. This n8n workflow takes a Screaming Frog crawl, filters the good stuff, formats everything nicely, and spits out a clean llms.txt file that LLMs will love. No manual sorting, no spreadsheet rage, just automation doing what automation does best.
In this guide, you will see what an llms.txt file is, how Screaming Frog and n8n work together, how the workflow is built, and how to customize it for your own site. Same technical details as the original tutorial, just with fewer yawns and more clarity.
First things first: what is an llms.txt file?
An llms.txt file is a simple text index that tells large language models which pages on your site are worth their attention. Think of it as a curated reading list for your website.
Each line usually contains:
- A title
- A URL
- An optional short description
By feeding an llms.txt file into your content discovery or ingestion pipeline, you:
- Help LLMs find your best pages faster
- Improve the quality of content used for training or querying
- Make prompts and results more relevant, especially for large sites
In other words, it is a tiny file with a big impact on LLM performance and content selection.
Why Screaming Frog + n8n is a great combo
Screaming Frog is the workhorse that crawls your website and collects page-level data. n8n is the automation brain that turns that data into a polished llms.txt file.
Screaming Frog gives you:
- URLs and titles
- Meta descriptions
- Status codes
- Indexability
- Content types
- Word counts
n8n then:
- Parses the Screaming Frog CSV export
- Maps and normalizes the fields you care about
- Filters out junk and non-indexable pages
- Optionally runs a text classifier with an LLM
- Formats everything into clean
llms.txtlines - Exports or uploads the finished file automatically
The result is a repeatable workflow you can run every time you crawl a site, instead of reinventing your process every project.
How the n8n workflow works (high-level overview)
Before we dive into setup, here is the general flow of the n8n template:
- Form trigger – You fill in the website name, a short description, and upload your Screaming Frog CSV.
- Extract CSV – The CSV is parsed into JSON records for n8n to process.
- Field mapping – Key columns like URL, title, status code, and word count are normalized.
- Filtering – Only indexable, status 200, HTML pages (plus any extra filters you add) are kept.
- (Optional) LLM classifier – A text classifier can further separate high-value content from everything else.
- Formatting – Each selected URL is turned into a formatted
llms.txtrow. - Concatenation – The rows are combined and prefixed with the website name and description.
- Export – A UTF-8
llms.txtfile is created and either downloaded or uploaded to cloud storage.
Once set up, your main job is to upload a fresh Screaming Frog export and let the workflow do the boring parts.
Step-by-step: set up the n8n workflow
1. Start with the form trigger
The workflow kicks off with a form node. This is where you provide the basic context and the crawl data:
- Website name – Used as the main heading at the top of the
llms.txtfile. - Short website description – Appears as the first lines of the file, giving LLMs a quick overview.
- Screaming Frog export – Typically
internal_html.csv(recommended) orinternal_all.csv.
Once the form is submitted, n8n has everything it needs to start building your index.
2. Extract and parse the Screaming Frog CSV
Next, an Extract node reads the uploaded CSV and turns each row into a JSON object. This is what allows later nodes to filter and transform data programmatically.
The workflow is designed to be friendly to multilingual Screaming Frog setups. It checks multiple possible column names so it works whether your Screaming Frog UI is in English, French, German, Spanish, or Italian.
3. Map the important fields
To avoid dealing with every possible Screaming Frog column name later, the workflow normalizes the key fields into a consistent schema. A Set node creates the following properties:
url← Addresstitle← Title 1description← Meta Description 1statut← Status Codeindexability← Indexabilitycontent_type← Content Typeword_count← Word Count
From this point onward, every node in the workflow can rely on these consistent field names, no matter how Screaming Frog labels them.
4. Filter out junk and non-indexable URLs
This is where quality control happens. The workflow applies several filters to keep only pages that are:
- Status code = 200
- Indexability = indexable (localized values are supported)
- Content type contains
text/html
You can also layer on extra filters if you want to be more selective:
- Minimum word count, for example greater than 300, to avoid very thin pages
- Include or exclude specific paths or folders to focus on certain sections
- Exclude paginated URLs or anything with query parameters if they are not useful for training
This step alone saves a lot of manual cleanup later, and your future self will thank you.
5. (Optional) Use an LLM text classifier for smarter selection
The template includes a deactivated Text Classifier node. When you enable it, the workflow sends details like URL, title, description, and word count to a language model.
The classifier then splits content into two groups:
useful_content– Pages that look important or high quality.other_content– Everything else.
This extra layer is especially handy on large sites where simple filters are not enough to find the truly valuable pages.
Important notes:
- Only activate this node if you are comfortable with using an LLM API and the associated costs.
- For very large sites, pair it with a Loop Over Items node to avoid timeouts and keep API usage manageable.
6. Format each line of the llms.txt file
Once you have your final list of URLs, the workflow formats each one into a neat llms.txt row. The template uses this pattern:
- [{{ title }}]({{ url }}){{ description ? ': ' + description : '' }}
For example:
- [How to Bake Sourdough](https://example.com/sourdough): A practical guide to ingredients, technique, and troubleshooting.
If there is no description, the part after the URL is simply omitted, so the format stays clean.
7. Combine rows and build the final llms.txt content
All the formatted rows are then concatenated into a single block of text. Before the list, the workflow prepends the website name and short description you provided in the form.
The final structure looks like this:
# My Example Website
> A short description of the website
- [Title 1](url1): description
- [Title 2](url2)
That heading and description give LLMs a bit of context about what they are looking at, instead of just dumping a list of URLs.
8. Export or upload the llms.txt file
The final step uses a Convert To File node to generate a UTF-8 text file named llms.txt.
You can:
- Download the file directly from the n8n UI, or
- Swap the last node for a cloud storage node, for example Google Drive, OneDrive, or S3, to automatically upload and store the file.
If you run this regularly, automating the upload is a nice way to stop hunting for files in your downloads folder.
Best practices for Screaming Frog and llms.txt automation
- Prefer
internal_html.csvwhen possible since it is already scoped to HTML pages. - Localize your mappings if your Screaming Frog interface is not in English. The template already supports common translations, so you usually just need to verify column names.
- Start small with a limited crawl to test your filters and, if used, the classifier behavior before scaling up.
- Use a clear naming convention when automating uploads, for example
site-name-llms-YYYYMMDD.txt. - Keep an eye on LLM costs if you enable the text classifier, especially on very large sites.
Troubleshooting common issues
If your workflow runs but the llms.txt file is empty or the workflow fails, check the following:
- Make sure the uploaded file is actually a Screaming Frog export and that it includes the expected columns.
- Temporarily disable the text classifier node to confirm that the basic filters alone still produce results.
- Use Set nodes or other logging inside n8n to preview intermediate outputs and verify that fields like
url,title, andindexabilityare mapped correctly.
Most issues come down to column naming mismatches or filters that are a bit too strict.
When this workflow is a perfect fit
This n8n template shines whenever you need a reliable, repeatable way to turn a website crawl into an llms.txt file for AI or SEO work. Typical use cases include:
- SEO teams preparing content sets for semantic analysis or LLM-powered audits.
- Data teams building prioritized web corpora for LLM fine-tuning or evaluation.
- Site owners who want a curated, human-readable index of their most important pages.
If you are tired of manually sorting URLs in spreadsheets, this workflow is basically your new favorite coworker.
Try the n8n llms.txt workflow template
To get started:
- Download the n8n workflow template.
- Run a Screaming Frog crawl and export
internal_html.csv(orinternal_all.csvif needed). - Upload the CSV through the form node, fill in the website name and description, and let the workflow generate your
llms.txtin minutes.
If you want help tuning filters, designing classifier prompts, or wiring up automatic uploads to your storage of choice, reach out to the author or leave a comment to explore consulting options.

It’s great to see how Screaming Frog and n8n can work together. I can imagine this combo being especially useful for anyone working with large amounts of content — having it automatically formatted for AI ingestion could really streamline the process.