Convert Natural-Language Email Questions into SQL with n8n and LangChain
Imagine asking, “Show me last week’s budget emails” or “Pull up everything in thread 123” and getting an instant answer, without ever touching SQL. That is exactly what this n8n workflow template helps you do.
In this guide, we will walk through a reusable n8n workflow that:
- Takes a plain-English question about your emails
- Uses an AI agent (LangChain + Ollama) to turn it into a valid PostgreSQL query
- Runs that SQL against your email metadata
- Returns clean, readable results
All of this happens with strict schema awareness, so the AI never invents columns or uses invalid operators. Let us break it down in a friendly, practical way so you can plug it into your own n8n setup.
When Should You Use This n8n + LangChain Workflow?
This workflow is perfect if:
- You have a PostgreSQL database with email metadata (subjects, senders, dates, threads, attachments, and so on).
- People on your team are not comfortable writing SQL, but still need to search and filter emails in flexible ways.
- You want natural-language search over large email archives without building a full custom UI.
Instead of teaching everyone SELECT, WHERE, and ILIKE, you let them type questions like a normal person. The workflow quietly handles the translation into safe, schema-respecting SQL.
Why Not Just Let AI “Guess” the SQL?
It is tempting to throw a model at your problem and say, “Here is what I want, please give me SQL.” The catch is that a naive approach often:
- References columns that do not exist
- Uses the wrong operators for data types
- Generates unsafe or destructive queries
This workflow solves those headaches by:
- Extracting the real database schema and giving it to the model as ground truth
- Using a strict system prompt that clearly defines what the AI can and cannot do
- Validating the generated SQL before it ever hits your database
- Executing queries only when they are syntactically valid and safe
The result is a more predictable, reliable, and audit-friendly way to use AI for SQL generation.
High-Level Overview: How the Workflow Works
The n8n workflow is split into two main parts that work together:
- Schema extraction – runs manually or on a schedule to keep an up-to-date snapshot of your database structure.
- Runtime query handling – kicks in whenever someone asks a question via chat or another trigger.
Here is the basic flow in plain language:
- Grab the list of tables and columns from PostgreSQL and save that schema as a JSON file.
- When a user asks a natural-language question, load that schema file.
- Send the schema, the current date, and the user question to a LangChain agent running on Ollama.
- Get back a single, raw SQL statement from the AI, then clean and verify it.
- Run the SQL with a Postgres node and format the results for the user.
Let us go deeper into each part and the key n8n nodes that make it all work.
Part 1: Schema Extraction Workflow
This part runs outside of user requests. Think of it as preparing the map so the AI never gets lost. You can trigger it manually or set it on a schedule whenever your schema changes.
Key n8n Nodes for Schema Extraction
- List all tables in the database
Use a Postgres node to run:
SELECT table_name FROM INFORMATION_SCHEMA.TABLES WHERE table_schema = 'public';
This gives you the list of all public tables that the AI is allowed to query. - List all columns for each table
For every table returned above, run another query to fetch metadata like:column_namedata_typeis_nullable- Whether it is an array or not
Make sure you also include the table name in the output so you can reconstruct the full schema later.
- Convert to JSON and save locally
Once you have all tables and columns, merge them into a single JSON structure. Then use a file node to save it somewhere predictable, for example:
/files/pgsql-{workflow.id}.json
This file becomes the source of truth that you pass to the AI agent.
After this step, you have a neat JSON snapshot of your database schema that your runtime workflow can quickly load without hitting the database every time.
Part 2: Runtime Query Workflow
This is the fun part. A user types something like “recent emails about projects from Sarah with attachments” and the workflow turns it into a useful SQL query and a readable response.
Runtime Path: Step-by-Step
- Trigger (chat or workflow)
The workflow starts when someone sends a natural-language question via an n8n Chat trigger or another custom trigger. - Load the schema from file
Use a file node to read the JSON schema you saved earlier. This gives the model an exact list of allowed tables, columns, and data types. - AI Agent (LangChain + Ollama)
Pass three key pieces of information to the LangChain agent:- The full schema JSON
- The current date (useful for queries like “yesterday” or “last week”)
- The user’s natural-language prompt
The agent is configured with a strict system prompt that tells it:
- What tables and columns exist
- Which operators to use for each data type
- That it must output only a single SQL statement ending with a semicolon
- Extract and verify the SQL
Parse the AI response to:- Pull out the raw SQL string
- Confirm that it is the right kind of statement (for example, a
SELECT) - Ensure it ends with a semicolon; if not, append one
- Postgres node
Feed the cleaned SQL into a Postgres node. This node runs the query against your database and returns the rows. - Format the query results
Finally, turn the raw rows into something friendly: a text summary, a markdown table, or another format that fits your chat or UI. Then send that back to the user.
From the user’s perspective, they just asked a question and got an answer. Behind the scenes, you have a carefully controlled AI agent and a safe SQL execution path.
Prompt Engineering: Getting the AI to Behave
The system prompt you give to LangChain is the heart of this setup. If you get this right, the agent becomes predictable and safe. If you are too vague, it will start improvising columns and structures that do not exist.
What to Include in the System Prompt
Here are the types of constraints that work well:
- Embed the exact schema
Put the JSON schema in a code block so the model can only reference what is listed. This is your “do not invent anything” anchor. - Whitelist specific metadata fields
For example, you might explicitly state that only fields likeemails_metadata.idandemails_metadata.thread_idare valid in certain contexts. - Operator rules per data type
Spell out which operators to use for each type, such as:ILIKEfor text searchesBETWEEN,>,<for timestamps and dates@>orANYfor arrays- Explicit handling for
NULLchecks
- Strict output rules
Be very clear, for example:- “Output ONLY the raw SQL statement ending with a semicolon.”
- “Do not include explanations or markdown, only SQL.”
- “Default to
SELECT * FROMunless the user asks for specific fields.”
These instructions drastically reduce hallucinations and make it much easier to validate and execute the generated SQL.
Example Prompts and SQL Outputs
Here are two concrete examples to show what you are aiming for.
User prompt: “recent emails about projects from Sarah with attachments”
SELECT * FROM emails_metadata
WHERE (email_subject ILIKE '%project%' OR email_text ILIKE '%project%')
AND email_from ILIKE '%sarah%'
AND attachments IS NOT NULL
ORDER BY date DESC;
User prompt: “emails in thread 123”
SELECT * FROM emails_metadata
WHERE thread_id = '123';
Notice how these queries:
- Use
ILIKEfor text searches - Respect actual column names like
email_subject,email_text,email_from,attachments, andthread_id - End with a semicolon as required
Keeping Things Safe: Validation and Guardrails
Even with a strong prompt, it is smart to layer in extra safety checks inside n8n.
Recommended Safety Checks
- Column name validation
Before executing the SQL, you can parse the query and compare all referenced columns to your saved schema JSON. If anything is not in the schema, reject or correct the query. - Block destructive queries
If you want this to be read-only, you can:- Reject any non-
SELECTstatements in your validation step - Or use a PostgreSQL user with read-only permissions so even a rogue query cannot modify data
- Reject any non-
- Limit result size
To avoid huge result sets, you can:- Enforce a default
LIMITif the user did not specify one - Or cap the maximum allowed limit
- Enforce a default
- Log generated queries
Store every generated SQL statement along with the original prompt. This helps with debugging, auditing, and improving your prompt over time.
Testing and Debugging Your Workflow
Once everything is wired up, it is worth spending a bit of time testing different scenarios so you can trust the system in production.
- Start with simple questions
Try prompts like “emails received yesterday” and inspect both the SQL and the returned rows to ensure they match your expectations. - Refresh the schema after changes
Whenever you add or modify tables and columns, run the schema extraction section manually or via a scheduled trigger so the JSON stays current. - Tighten the prompt if it invents columns
If you see made-up fields, adjust the system prompt with stronger negative instructions and examples of what is not allowed. - Test edge cases
Ask for:- Date ranges, like “emails from last month”
- Array filters
- Null checks
Confirm that the operators and conditions are correct.
Ideas for Extending the Workflow
Once the basic version is running smoothly, you can start layering on more features.
- Field selection
Teach the agent to return only specific columns when users ask, for example “show subject and sender for yesterday’s emails.” - Pagination
AddOFFSETandLIMITsupport so users can page through results like “next 50 emails.” - Conversational follow-ups
Keep context between queries. For example, after “show me last week’s emails” the user might say “only from last month” or “just the ones from Sarah” and you can refine the previous query. - Audit dashboard
Build a small dashboard that displays:- Generated queries
- Response times
- Error rates
This helps you monitor performance and usage patterns.
Why This Pattern Is So Useful
At its core, this n8n + LangChain workflow gives non-technical users a safe way to query email metadata in plain English. The key ingredients are:
- An authoritative, extracted schema that the model must follow
- A carefully crafted system prompt that locks down behavior and output format
- Validation logic that inspects the generated SQL before execution
- A read-only Postgres user or other safeguards for extra protection
The nice thing is that this pattern is not limited to email. You can reuse the same idea for support tickets, CRM data, analytics, or any other structured dataset you want to expose through natural language.
Ready to Try It Yourself?
If this sounds like something you want in your toolkit, you have a couple of easy next steps:
- Implement the schema extraction flow in your own n8n instance.
- Set up the LangChain + Ollama agent with the prompt rules described above.
- Wire in your Postgres connection and test with a few simple questions.
If you would like a bit of help, you have options. I can:
- Provide a step-by-step checklist you can follow inside n8n, or
- Share a cleaned n8n JSON export that you can import and customize
Just decide which approach fits you better and have your Postgres schema handy. With that, it is straightforward to adapt the system prompt and nodes to your specific setup.
