Back to Playbooks
Academicintermediate
narrative trackinginformation cascaderesearchcross-sourcealternative data

I Tracked How a Topic Goes From ArXiv Paper to Reddit Hype to Market Mover — in One API Call Per Source

Trawl Team·

Information Has a Lifecycle. Track It.

Every market-moving narrative follows the same path: an academic paper gets published, niche YouTube creators pick it up, Reddit threads start forming, mainstream news publishes articles, and finally Google Trends spikes. By the time your Bloomberg terminal lights up, the signal has been visible for months — if you knew where to look.

I built a tool that tracks this cascade across six sources through Trawl's unified API. For any topic, it tells you exactly where in the lifecycle that narrative sits right now.

The Cascade Model

Information flows through five predictable stages:

StageNameSignal
1AcademicTopic exists only in papers. Very early, no public awareness.
2Early AdopterNiche YouTube creators and podcasters are covering it.
3Social AmplificationReddit threads are forming. Retail awareness growing.
4MainstreamNews outlets are publishing articles. Broad awareness.
5Peak / SaturationGoogle Trends spike and Wikipedia pageview surge. Likely priced in.

The alpha is in the gap between stages. A topic at Stage 2 (niche YouTube coverage) that's headed to Stage 4 (mainstream news) represents an information advantage. A topic already at Stage 5 is priced in — you're late.

Step 1: Search Academic Papers

Start with the source of truth. Has this topic been published in academic literature?

Step 2: Check Social Amplification

Is Reddit talking about it? When social platforms start amplifying, you're entering Stage 3.

Step 3: Mainstream Coverage

Has news media picked it up? If yes, you're at Stage 4 or beyond.

Step 4: Check Saturation

Wikipedia pageviews are a proxy for mass public awareness. A spike here means Stage 5 — the narrative is saturated.

Wikipedia pageviews
curl "https://api.gettrawl.com/api/pageviews/GLP-1?article=GLP-1"

The Lifecycle Detection Algorithm

The stage is determined by which sources have results:

Stage 5: News + (Google Trends spike OR Wikipedia surge)  -> Saturated
Stage 4: News coverage detected                           -> Mainstream
Stage 3: Reddit threads forming                           -> Social amplification
Stage 2: YouTube/podcast coverage + papers                -> Early adopter
Stage 1: Papers only                                      -> Academic

The tool also computes a trend direction for each source. It splits the date range in half and compares item density in each period. If the second half has 30%+ more items, the trend is "increasing." If 30%+ fewer, "decreasing." This tells you whether a narrative is gaining or losing momentum within each stage.

The Full Pipeline

from trawl import TrawlClient
from datetime import datetime

client = TrawlClient()

def analyze_cascade(topic: str) -> dict:
    """Track a narrative across all sources and determine lifecycle stage."""

    # Search all sources
    papers = client.papers.search(q=topic)
    reddit = client.reddit.search(query=topic)
    news = client.news.search(q=topic)
    youtube = client.search(q=topic)

    # Check saturation signals
    try:
        pageviews = client.pageviews.get(topic)
        has_pageview_spike = True
    except Exception:
        has_pageview_spike = False

    # Determine stage
    has_papers = len(papers.results) > 0
    has_youtube = len(youtube.results) > 0
    has_reddit = len(reddit.results) > 0
    has_news = len(news.results) > 0

    if has_news and has_pageview_spike:
        stage = 5  # Saturated
    elif has_news:
        stage = 4  # Mainstream
    elif has_reddit:
        stage = 3  # Social amplification
    elif has_youtube and has_papers:
        stage = 2  # Early adopter
    else:
        stage = 1  # Academic

    # Build cascade timeline (earliest mention per source)
    timeline = []
    sources = {
        "papers": papers.results,
        "youtube": youtube.results,
        "reddit": reddit.results,
        "news": news.results,
    }

    for name, results in sources.items():
        if results:
            dates = [r.date for r in results if hasattr(r, 'date') and r.date]
            if dates:
                earliest = min(dates)
                timeline.append({
                    "source": name,
                    "earliest": earliest,
                    "count": len(results),
                })

    timeline.sort(key=lambda x: x["earliest"])

    stage_names = {
        1: "Academic",
        2: "Early Adopter",
        3: "Social Amplification",
        4: "Mainstream",
        5: "Peak / Saturation",
    }

    return {
        "topic": topic,
        "stage": stage,
        "stage_name": stage_names[stage],
        "timeline": timeline,
        "source_counts": {name: len(r) for name, r in sources.items()},
    }

# Run it
result = analyze_cascade("GLP-1 weight loss")
print(f"Topic: {result['topic']}")
print(f"Stage: {result['stage']} - {result['stage_name']}")
print()
print("Cascade Timeline:")
for entry in result["timeline"]:
    print(f"  {entry['source']:10s}  {entry['earliest']}  ({entry['count']} results)")

Why This Matters

Every narrative that moved markets in the last decade — AI, crypto, GLP-1 drugs, meme stocks — followed this exact cascade. The information was available at Stage 1 or 2 if you were looking in the right places. By Stage 5, the trade is crowded.

The key insight is that Trawl's unified API makes cross-source temporal analysis trivial. What would normally require six different APIs, six different auth flows, and six different response formats is just six GET requests to the same base URL. No API keys. No authentication. One base URL.

Non-Code Options

Trawl's MCP server works inside Claude Desktop — describe what you're tracking and it handles the API calls:

"Track the narrative lifecycle of 'GLP-1 weight loss' across academic papers, YouTube, Reddit, and news. What stage is it in? Is it accelerating or decelerating?"

Claude searches all sources, compares earliest mention dates, and gives you a lifecycle assessment. You get the analysis without writing a line of code.

Source Code

The full implementation with parallel fetching, trend detection, and Rich terminal output:

github.com/trawlhq/examples/tree/master/narrative-cascade