How to Build a YouTube RAG Pipeline in 10 Minutes

Every RAG pipeline starts with data. The best data for many AI applications lives in YouTube videos — lectures, interviews, conference talks, earnings calls — but it's trapped inside audio and video files where no embedding model can reach it.

This guide shows you how to build a complete YouTube → Vector DB pipeline with Trawl in under 10 minutes. No YouTube API quotas. No rate limits. No third-party libraries that break every month.

What you'll build

By the end of this tutorial, you'll have a working pipeline that:

Searches YouTube for videos on any topic
Extracts full transcripts with timestamps
Chunks them into token-bounded segments optimized for embeddings
Loads everything into Pinecone (or any vector store)
Queries your knowledge base with natural language

Prerequisites: Python 3.10+, a Trawl API key (free tier works), and a Pinecone account.

Step 1: Install the SDK

pip install trawl-sdk openai pinecone-client

Step 2: Search YouTube

Use Trawl's YouTube search to find relevant videos. This uses the innertube API internally — no YouTube Data API quota consumed.

Search YouTube

Find videos to feed into your RAG pipeline

Search query

Results

curl "https://api.gettrawl.com/api/search?q=transformer+architecture+explained&max_results=5"

from trawl import TrawlClient

client = TrawlClient(api_key="trawl_your_key")

# Search YouTube — returns real-time results
results = client.search.youtube("transformer architecture explained", max_results=5)

print(f"Found {len(results.results)} videos:\n")
for video in results.results:
    print(f"  📺 {video.title}")
    print(f"     {video.channel} · {video.published_at}")
    print(f"     https://youtube.com/watch?v={video.video_id}\n")

Tip: The search endpoint is public — no API key required. But using the SDK with a key gives you higher rate limits and access to authenticated endpoints.

Step 3: Extract a Transcript

Try extracting a transcript from any video — this is the data that feeds your embeddings:

Extract transcript

Full transcript with timestamps — ready for chunking and embedding

YouTube URL

curl -X POST "https://api.gettrawl.com/api/transcripts/preview" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://www.youtube.com/watch?v=8jPQjjsBbIc"
}'

Step 4: Bulk download as JSONL

Trawl's bulk download endpoint extracts up to 30 transcripts in one request and returns them as a ZIP file. The jsonl format gives you token-bounded chunks ready for embedding — no preprocessing needed.

import zipfile
import json
from io import BytesIO

# Grab video IDs from search results
video_ids = [v.video_id for v in results.results]

# Download all transcripts as JSONL chunks
zip_bytes = client.bulk.download(video_ids, format="jsonl")

# Parse the ZIP
chunks = []
with zipfile.ZipFile(BytesIO(zip_bytes)) as zf:
    for name in zf.namelist():
        if name.endswith(".jsonl"):
            for line in zf.read(name).decode().strip().split("\n"):
                if line:
                    chunks.append(json.loads(line))

print(f"✅ Extracted {len(chunks)} chunks from {len(video_ids)} videos")

Each JSONL chunk looks like this:

{
  "video_id": "abc123",
  "title": "Transformers Explained",
  "channel": "3Blue1Brown",
  "source_url": "https://youtube.com/watch?v=abc123",
  "platform": "youtube",
  "language": "en",
  "chunk_index": 0,
  "timestamp_start": 0.0,
  "timestamp_end": 45.2,
  "text": "Welcome to this deep dive into the transformer architecture...",
  "token_count": 487
}

Why JSONL? Each line is a self-contained JSON document with full metadata. The token_count field tells you exactly how many tokens each chunk contains — no need to estimate. The timestamp_start and timestamp_end let you link back to the exact moment in the video.

Step 5: Generate embeddings and load into Pinecone

Now we take those chunks, generate embeddings with OpenAI, and upsert them into Pinecone.

from openai import OpenAI
import pinecone

openai_client = OpenAI()

# Generate embeddings for all chunks
print("Generating embeddings...")
texts = [c["text"] for c in chunks]
response = openai_client.embeddings.create(
    input=texts,
    model="text-embedding-3-small"
)
embeddings = response.data

# Connect to Pinecone
pc = pinecone.Pinecone()
index = pc.Index("youtube-transcripts")

# Build vectors with rich metadata
vectors = []
for chunk, embedding in zip(chunks, embeddings):
    vectors.append({
        "id": f"{chunk['video_id']}-{chunk['chunk_index']}",
        "values": embedding.embedding,
        "metadata": {
            "video_id": chunk["video_id"],
            "title": chunk["title"],
            "channel": chunk.get("channel", ""),
            "text": chunk["text"],
            "timestamp_start": chunk["timestamp_start"],
            "timestamp_end": chunk["timestamp_end"],
            "source_url": chunk.get("source_url", ""),
        }
    })

# Upsert in batches of 100
for i in range(0, len(vectors), 100):
    index.upsert(vectors=vectors[i:i+100])

print(f"✅ Loaded {len(vectors)} vectors into Pinecone")

Step 6: Query your knowledge base

Now you can ask natural language questions and get answers grounded in real YouTube content:

def ask(question: str, top_k: int = 3):
    """Query the YouTube knowledge base."""
    # Embed the question
    q_embedding = openai_client.embeddings.create(
        input=[question],
        model="text-embedding-3-small"
    ).data[0].embedding

    # Search Pinecone
    results = index.query(
        vector=q_embedding,
        top_k=top_k,
        include_metadata=True
    )

    print(f"\n🔍 Question: {question}\n")
    for i, match in enumerate(results["matches"], 1):
        meta = match["metadata"]
        score = match["score"]
        print(f"  {i}. [{score:.3f}] {meta['title']}")
        print(f"     📺 {meta['source_url']}")
        print(f"     ⏱️  {meta['timestamp_start']:.0f}s - {meta['timestamp_end']:.0f}s")
        print(f"     💬 {meta['text'][:150]}...")
        print()

# Try it!
ask("How do attention mechanisms work in transformers?")
ask("What is the difference between self-attention and cross-attention?")
ask("Why are transformers better than RNNs for long sequences?")

Why Trawl for RAG

Feature	YouTube Data API	youtube-transcript-api	Trawl
Daily quota	10,000 units	IP blocks	Unlimited
JSONL chunks	No	No	Built in
Bulk download	No	Manual loop	30 in one ZIP
Timestamps	No	Yes	Yes + metadata
Token counts	No	No	Per chunk
SDK	Yes	Yes	Yes (Python + JS)
Search included	Yes (quota)	No	Yes (no quota)

Going further

Add podcast content to the same pipeline

# Search podcasts — same API, same schema
podcasts = client.search.podcasts("machine learning")
print(f"Found {len(podcasts.results)} podcasts")

Add earnings calls

# Earnings call transcripts for financial RAG
earnings = client.earnings.search("AAPL")
transcript = client.earnings.get_transcript("AAPL", 2026, 1)

Use the unified search

# Search across ALL sources at once
import requests
results = requests.get(
    "https://api.gettrawl.com/api/search/unified",
    params={"q": "transformer architecture", "sources": "all", "max_per_source": 3}
).json()

for r in results["results"]:
    print(f"[{r['source_type']}] {r['title']}")

Full pipeline in 20 lines

Here's the entire pipeline condensed:

from trawl import TrawlClient
from openai import OpenAI
import pinecone, zipfile, json
from io import BytesIO

tc = TrawlClient(api_key="trawl_your_key")
ai = OpenAI()
pc = pinecone.Pinecone()
idx = pc.Index("youtube-rag")

# 1. Search + extract + chunk
videos = tc.search.youtube("AI agents tutorial", max_results=5)
zb = tc.bulk.download([v.video_id for v in videos.results], format="jsonl")
chunks = []
with zipfile.ZipFile(BytesIO(zb)) as zf:
    for n in zf.namelist():
        if n.endswith(".jsonl"):
            chunks += [json.loads(l) for l in zf.read(n).decode().strip().split("\n") if l]

# 2. Embed + load
embs = ai.embeddings.create(input=[c["text"] for c in chunks], model="text-embedding-3-small").data
idx.upsert(vectors=[{"id": f"{c['video_id']}-{c['chunk_index']}", "values": e.embedding, "metadata": {"text": c["text"], "title": c["title"]}} for c, e in zip(chunks, embs)])
print(f"Done! {len(chunks)} chunks in Pinecone.")

That's it. YouTube → Trawl → Embeddings → Pinecone → RAG. 10 minutes. Zero quotas.

Drop Into Your Framework

You don't have to wire up embeddings and vector stores yourself. If you're already using a framework, Trawl plugs straight in.

LangChain

from langchain_trawl import TrawlLoader

loader = TrawlLoader(urls=[
    "https://youtube.com/watch?v=...",
    "https://youtube.com/watch?v=...",
])
docs = loader.load()

# Each doc has .page_content (transcript text) and .metadata (title, language, segments)
# Drop directly into any vector store
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings())

LlamaIndex

from trawl_llamaindex import TrawlReader
from llama_index.core import VectorStoreIndex

reader = TrawlReader()
documents = reader.load_data(urls=["https://youtube.com/watch?v=..."])

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What was the main argument?")

CrewAI

from crewai import Agent, Task, Crew
from trawl_crewai import TrawlSearchTool, TrawlTranscriptTool

researcher = Agent(
    role="Content Researcher",
    goal="Find and analyze YouTube content on any topic",
    tools=[TrawlSearchTool(), TrawlTranscriptTool()],
)

task = Task(
    description="Find the top 5 videos about RAG pipelines and summarize the key techniques discussed",
    agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()

MCP (No Code)

Don't want to write code? Add Trawl to Claude Desktop and just ask:

"Search YouTube for RAG pipeline tutorials, extract the top 3 transcripts, and summarize the key techniques they cover."

78 tools. Zero lines of code. Set up MCP →

Ready to build? Get your free API key → or read the full API docs →