Build an AI Agent to Chat with YouTube in n8n
This guide documents a production-ready n8n workflow template that builds an AI agent capable of “chatting” with YouTube. The workflow integrates the YouTube Data API, Apify, and OpenAI to:
- Query channels and videos
- Aggregate and analyze comments
- Trigger video transcription
- Evaluate thumbnails with image analysis
- Maintain conversational context in Postgres
The focus here is on a technical, node-level breakdown so you can understand, adapt, and extend the workflow in your own n8n instance.
1. Overview and Capabilities
The workflow exposes a chat-style interface on top of multiple YouTube-related tools. A single agent node orchestrates which tool to call based on user input. At a high level, the workflow can:
1.1 Core Features
- Channel inspection Retrieve channel metadata by handle or URL, including:
channel_id- Channel title
- Channel description
- Video discovery Search or list videos for a given channel with sorting options (for example by
dateorviewCount). - Video detail enrichment Fetch detailed video information such as:
- Title and description
- Statistics (views, likes, etc.)
- Content details including
contentDetails.durationto help filter out Shorts
- Comment aggregation Pull comment threads via the YouTube Data API, paginate across pages, flatten threads, and feed them into an LLM for sentiment and insight extraction.
- Video transcription Trigger an Apify transcription actor (or equivalent provider) using the video URL, then analyze the resulting text.
- Thumbnail and image analysis Send thumbnail URLs to OpenAI image analysis tools for design critique and optimization suggestions.
- Conversation memory Persist chat context in a Postgres database so the agent can reference prior messages and previous tool outputs.
1.2 Intended Users
This template is designed for users who are already comfortable with:
- n8n workflow design and credential management
- REST APIs (in particular YouTube Data API)
- LLM-based agents and prompt configuration
2. Architecture & Data Flow
The workflow is organized around an agent pattern. The agent receives user queries from a chat trigger, plans which tools to call, and then returns a synthesized answer.
2.1 High-Level Components
- Chat Trigger A webhook-based entry point that accepts incoming chat messages and optional metadata (for example session identifiers).
- OpenAI Chat Model Node The LLM that interprets user requests, calls tools, and generates responses.
- Agent Node (LangChain-style) Wraps the OpenAI model and exposes a set of tools. It outputs a command specifying which tool to run next.
- Switch Node (Tool Router) Routes agent commands such as
get_channel_details,video_details,comments,search,videos,analyze_thumbnail, andvideo_transcriptionto the appropriate implementation nodes. - HTTP Request Nodes Implement the YouTube Data API calls and Apify calls. Each node is configured with query parameters and credentials.
- OpenAI Image / Analysis Nodes Handle thumbnail and text analysis using OpenAI models.
- Postgres Node (Optional Memory) Stores conversation history that the agent can reference across multiple requests.
2.2 Execution Flow
- Chat trigger receives a user message via webhook.
- Message and context are passed to the agent node.
- The agent decides which tool to call and outputs a command identifier.
- The Switch node evaluates this command and routes the execution to the appropriate HTTP or wrapper node.
- Tool results are returned to the agent, which may chain additional tools or respond directly to the user.
- Optionally, conversation state and results are persisted in Postgres for future interactions.
3. Prerequisites & Required Services
3.1 Platform Requirements
- Running n8n instance (self-hosted or n8n Cloud)
- Basic familiarity with n8n node configuration and credential management
3.2 External APIs and Keys
- Google Cloud / YouTube Data API
- Google Cloud project with the YouTube Data API enabled
- API key for YouTube Data API requests
- OpenAI
- OpenAI API key
- Access to the models you intend to use for text and image analysis (including multimodal if using image analysis)
- Apify (or equivalent transcription provider)
- Apify API token to run the transcription actor
- Postgres (optional but recommended)
- Postgres instance and credentials for storing chat memory
4. Setup & Configuration Steps
4.1 Configure API Credentials
- YouTube Data API
- In Google Cloud Console, enable the YouTube Data API.
- Create an API key and restrict it appropriately.
- OpenAI
- Generate an API key in your OpenAI account.
- Confirm that the account has access to the models used for both text and image analysis.
- Apify
- Create an API token for the transcription actor.
- Add credentials to n8n
- Open Credentials in n8n.
- Create entries for YouTube API key, OpenAI, and Apify.
- Reference these credentials in the corresponding HTTP Request and OpenAI nodes, replacing any placeholders in the imported workflow.
4.2 Import the Workflow Template
- Export or download the provided n8n workflow JSON/template.
- In n8n, use Import from file or Import from JSON to load the template.
- Confirm that the workflow includes:
- Chat trigger node
- OpenAI chat model node
- Agent node
- Switch (router) node
- HTTP Request nodes for YouTube and Apify
- Optional Postgres node for memory
4.3 Configure Chat Trigger & Agent
- Chat trigger
- Set up the webhook URL that external clients will call.
- Define the expected payload structure (for example
message,session_id).
- Agent system prompt
- Configure the agent node with a system prompt that defines it as a YouTube assistant.
- Include clear instructions on when and how to call each tool, referencing their exact tool names such as
get_channel_details,comments,video_transcription, etc.
- Postgres memory (optional)
- Connect the agent to the Postgres node if you want persistent conversation memory.
- Ensure the schema and retention policy are configured as required.
4.4 Update HTTP Request Nodes
For every HTTP Request node that calls YouTube or Apify:
- Select the correct credential from the dropdown (YouTube API key, Apify token).
- Verify
base URLand resource paths match the APIs you are using. - Check query parameters such as:
part(for examplesnippet,contentDetails,statistics)maxResultsorderorsortvalues (for exampledate,viewCount)
4.5 Validate Common Flows
Before exposing the workflow to end users, test the main tool paths:
- Channel details Use a handle or channel URL to test the
get_channel_detailscommand and confirm that thechannel_idis correctly extracted. - Comments Call
commentswith a validvideo_id. Confirm pagination is working and that the Edit Fields node is flattening threads correctly into a clean structure for analysis. - Transcription Trigger
video_transcriptionfor a video URL and verify that the Apify actor completes and returns text. - Thumbnail analysis Provide a thumbnail URL to the
analyze_thumbnailtool and confirm OpenAI returns structured feedback.
5. Node-by-Node Functional Breakdown
5.1 Channel & Video Retrieval Tools
5.1.1 Channel Details Tool
Purpose: Convert a channel handle or URL into a canonical channel_id and retrieve channel metadata.
- Input: Channel handle (for example
@channelName) or full channel URL. - Process: HTTP Request node calls the YouTube Data API with appropriate parameters.
- Output:
channel_id, title, description, and related snippet data.
5.1.2 Videos Listing Tool
Purpose: Fetch a list of videos for a given channel_id.
- Input:
channel_idand sorting option (for exampledateorviewCount). - Process: HTTP Request node queries the YouTube Data API to list videos.
- Output: Video IDs and associated metadata, which can be passed to the video details tool.
Note: YouTube search endpoints may return Shorts. To exclude Shorts, you should:
- Pass video IDs to the video details tool.
- Inspect
contentDetails.durationfor each video. - Filter out entries with durations shorter than 60 seconds if you do not want Shorts.
5.1.3 Video Details Tool
Purpose: Enrich a set of video IDs with full details and statistics.
- Input: One or more
video_idvalues. - Process: HTTP Request node calls the videos endpoint with
partfields likesnippet,contentDetails,statistics. - Output: Detailed video metadata including duration, which is key for Shorts filtering.
5.2 Comments Aggregation & Analysis
5.2.1 Comments Fetch Tool
Purpose: Retrieve comment threads for a specific video.
- Input:
video_id. - Process:
- HTTP Request node calls the commentThreads endpoint.
- Configured to return up to 100 comments per request via
maxResults. - Pagination is handled either within the node (looping over
nextPageToken) or by the agent’s plan that repeatedly calls the tool until all pages are retrieved.
- Output: Raw comment threads including top-level comments and replies.
5.2.2 Comment Flattening & Transformation
Purpose: Convert nested comment threads into a structure that is easy for the LLM to process.
- Node type: Edit Fields node in n8n.
- Behavior:
- Flattens each thread into a single item or text blob.
- Combines top-level comments with their replies.
- Produces a clean representation (for example concatenated text) suitable for sentiment and theme analysis.
5.2.3 LLM-based Comment Analysis
Purpose: Use OpenAI to extract themes, pain points, sentiment, and actionable insights from the flattened comments.
- Input: Structured or concatenated comment text from the Edit Fields node.
- Process: OpenAI chat model node with a prompt tailored for comment analysis.
- Output: Summaries, sentiment breakdown, and key insights that the agent can present back to the user.
5.3 Transcription Flow
5.3.1 Transcription Trigger Tool
Purpose: Request a full transcription for a given video using Apify or a similar transcription service.
- Input: Video URL.
- Process: HTTP Request node calls the Apify transcription actor with the video URL as input.
- Output: A transcription text payload once the Apify actor finishes.
Usage notes:
- Transcription cost is typically proportional to video length. Long videos can be more expensive.
- Ensure the input URL is in a format accepted by the Apify actor.
5.3.2 Transcription Analysis
Purpose: Analyze the returned transcript for content repurposing
