How to Build a YouTube RAG Pipeline in 10 Minutes
Every RAG pipeline starts with data. The best data for many AI applications lives in YouTube videos — lectures, interviews, conference talks, earnings calls — but it's trapped inside audio and video files where no embedding model can reach it.
This guide shows you how to build a complete YouTube → Vector DB pipeline with Trawl in under 10 minutes. No YouTube API quotas. No rate limits. No third-party libraries that break every month.
What you'll build
By the end of this tutorial, you'll have a working pipeline that:
- Searches YouTube for videos on any topic
- Extracts full transcripts with timestamps
- Chunks them into token-bounded segments optimized for embeddings
- Loads everything into Pinecone (or any vector store)
- Queries your knowledge base with natural language
Prerequisites: Python 3.10+, a Trawl API key (free tier works), and a Pinecone account.
Step 1: Install the SDK
pip install trawl-sdk openai pinecone-client
Step 2: Search YouTube
Use Trawl's YouTube search to find relevant videos. This uses the innertube API internally — no YouTube Data API quota consumed.
curl "https://api.gettrawl.com/api/search?q=transformer+architecture+explained&max_results=5"from trawl import TrawlClient
client = TrawlClient(api_key="trawl_your_key")
# Search YouTube — returns real-time results
results = client.search.youtube("transformer architecture explained", max_results=5)
print(f"Found {len(results.results)} videos:\n")
for video in results.results:
print(f" 📺 {video.title}")
print(f" {video.channel} · {video.published_at}")
print(f" https://youtube.com/watch?v={video.video_id}\n")
Tip: The search endpoint is public — no API key required. But using the SDK with a key gives you higher rate limits and access to authenticated endpoints.
Step 3: Extract a Transcript
Try extracting a transcript from any video — this is the data that feeds your embeddings:
curl -X POST "https://api.gettrawl.com/api/transcripts/preview" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.youtube.com/watch?v=8jPQjjsBbIc"
}'Step 4: Bulk download as JSONL
Trawl's bulk download endpoint extracts up to 30 transcripts in one request and returns them as a ZIP file. The jsonl format gives you token-bounded chunks ready for embedding — no preprocessing needed.
import zipfile
import json
from io import BytesIO
# Grab video IDs from search results
video_ids = [v.video_id for v in results.results]
# Download all transcripts as JSONL chunks
zip_bytes = client.bulk.download(video_ids, format="jsonl")
# Parse the ZIP
chunks = []
with zipfile.ZipFile(BytesIO(zip_bytes)) as zf:
for name in zf.namelist():
if name.endswith(".jsonl"):
for line in zf.read(name).decode().strip().split("\n"):
if line:
chunks.append(json.loads(line))
print(f"✅ Extracted {len(chunks)} chunks from {len(video_ids)} videos")
Each JSONL chunk looks like this:
{
"video_id": "abc123",
"title": "Transformers Explained",
"channel": "3Blue1Brown",
"source_url": "https://youtube.com/watch?v=abc123",
"platform": "youtube",
"language": "en",
"chunk_index": 0,
"timestamp_start": 0.0,
"timestamp_end": 45.2,
"text": "Welcome to this deep dive into the transformer architecture...",
"token_count": 487
}
Why JSONL? Each line is a self-contained JSON document with full metadata. The
token_countfield tells you exactly how many tokens each chunk contains — no need to estimate. Thetimestamp_startandtimestamp_endlet you link back to the exact moment in the video.
Step 5: Generate embeddings and load into Pinecone
Now we take those chunks, generate embeddings with OpenAI, and upsert them into Pinecone.
from openai import OpenAI
import pinecone
openai_client = OpenAI()
# Generate embeddings for all chunks
print("Generating embeddings...")
texts = [c["text"] for c in chunks]
response = openai_client.embeddings.create(
input=texts,
model="text-embedding-3-small"
)
embeddings = response.data
# Connect to Pinecone
pc = pinecone.Pinecone()
index = pc.Index("youtube-transcripts")
# Build vectors with rich metadata
vectors = []
for chunk, embedding in zip(chunks, embeddings):
vectors.append({
"id": f"{chunk['video_id']}-{chunk['chunk_index']}",
"values": embedding.embedding,
"metadata": {
"video_id": chunk["video_id"],
"title": chunk["title"],
"channel": chunk.get("channel", ""),
"text": chunk["text"],
"timestamp_start": chunk["timestamp_start"],
"timestamp_end": chunk["timestamp_end"],
"source_url": chunk.get("source_url", ""),
}
})
# Upsert in batches of 100
for i in range(0, len(vectors), 100):
index.upsert(vectors=vectors[i:i+100])
print(f"✅ Loaded {len(vectors)} vectors into Pinecone")
Step 6: Query your knowledge base
Now you can ask natural language questions and get answers grounded in real YouTube content:
def ask(question: str, top_k: int = 3):
"""Query the YouTube knowledge base."""
# Embed the question
q_embedding = openai_client.embeddings.create(
input=[question],
model="text-embedding-3-small"
).data[0].embedding
# Search Pinecone
results = index.query(
vector=q_embedding,
top_k=top_k,
include_metadata=True
)
print(f"\n🔍 Question: {question}\n")
for i, match in enumerate(results["matches"], 1):
meta = match["metadata"]
score = match["score"]
print(f" {i}. [{score:.3f}] {meta['title']}")
print(f" 📺 {meta['source_url']}")
print(f" ⏱️ {meta['timestamp_start']:.0f}s - {meta['timestamp_end']:.0f}s")
print(f" 💬 {meta['text'][:150]}...")
print()
# Try it!
ask("How do attention mechanisms work in transformers?")
ask("What is the difference between self-attention and cross-attention?")
ask("Why are transformers better than RNNs for long sequences?")
Why Trawl for RAG
| Feature | YouTube Data API | youtube-transcript-api | Trawl |
|---|---|---|---|
| Daily quota | 10,000 units | IP blocks | Unlimited |
| JSONL chunks | No | No | Built in |
| Bulk download | No | Manual loop | 30 in one ZIP |
| Timestamps | No | Yes | Yes + metadata |
| Token counts | No | No | Per chunk |
| SDK | Yes | Yes | Yes (Python + JS) |
| Search included | Yes (quota) | No | Yes (no quota) |
Going further
Add podcast content to the same pipeline
# Search podcasts — same API, same schema
podcasts = client.search.podcasts("machine learning")
print(f"Found {len(podcasts.results)} podcasts")
Add earnings calls
# Earnings call transcripts for financial RAG
earnings = client.earnings.search("AAPL")
transcript = client.earnings.get_transcript("AAPL", 2026, 1)
Use the unified search
# Search across ALL sources at once
import requests
results = requests.get(
"https://api.gettrawl.com/api/search/unified",
params={"q": "transformer architecture", "sources": "all", "max_per_source": 3}
).json()
for r in results["results"]:
print(f"[{r['source_type']}] {r['title']}")
Full pipeline in 20 lines
Here's the entire pipeline condensed:
from trawl import TrawlClient
from openai import OpenAI
import pinecone, zipfile, json
from io import BytesIO
tc = TrawlClient(api_key="trawl_your_key")
ai = OpenAI()
pc = pinecone.Pinecone()
idx = pc.Index("youtube-rag")
# 1. Search + extract + chunk
videos = tc.search.youtube("AI agents tutorial", max_results=5)
zb = tc.bulk.download([v.video_id for v in videos.results], format="jsonl")
chunks = []
with zipfile.ZipFile(BytesIO(zb)) as zf:
for n in zf.namelist():
if n.endswith(".jsonl"):
chunks += [json.loads(l) for l in zf.read(n).decode().strip().split("\n") if l]
# 2. Embed + load
embs = ai.embeddings.create(input=[c["text"] for c in chunks], model="text-embedding-3-small").data
idx.upsert(vectors=[{"id": f"{c['video_id']}-{c['chunk_index']}", "values": e.embedding, "metadata": {"text": c["text"], "title": c["title"]}} for c, e in zip(chunks, embs)])
print(f"Done! {len(chunks)} chunks in Pinecone.")
That's it. YouTube → Trawl → Embeddings → Pinecone → RAG. 10 minutes. Zero quotas.
Drop Into Your Framework
You don't have to wire up embeddings and vector stores yourself. If you're already using a framework, Trawl plugs straight in.
LangChain
from langchain_trawl import TrawlLoader
loader = TrawlLoader(urls=[
"https://youtube.com/watch?v=...",
"https://youtube.com/watch?v=...",
])
docs = loader.load()
# Each doc has .page_content (transcript text) and .metadata (title, language, segments)
# Drop directly into any vector store
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())
LlamaIndex
from trawl_llamaindex import TrawlReader
from llama_index.core import VectorStoreIndex
reader = TrawlReader()
documents = reader.load_data(urls=["https://youtube.com/watch?v=..."])
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What was the main argument?")
CrewAI
from crewai import Agent, Task, Crew
from trawl_crewai import TrawlSearchTool, TrawlTranscriptTool
researcher = Agent(
role="Content Researcher",
goal="Find and analyze YouTube content on any topic",
tools=[TrawlSearchTool(), TrawlTranscriptTool()],
)
task = Task(
description="Find the top 5 videos about RAG pipelines and summarize the key techniques discussed",
agent=researcher,
)
crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()
MCP (No Code)
Don't want to write code? Add Trawl to Claude Desktop and just ask:
"Search YouTube for RAG pipeline tutorials, extract the top 3 transcripts, and summarize the key techniques they cover."
78 tools. Zero lines of code. Set up MCP →
Ready to build? Get your free API key → or read the full API docs →