Back to Playbooks
Prediction Marketsintermediate
tutorialtradingpolymarketuse-case

I Built a Prediction Market Trading Bot That Trawls 5 Data Sources for Geopolitical Intelligence

Trawl Team·

There's a prediction market on Polymarket: "Will there be a US-Iran ceasefire in 2026?"

To trade this intelligently, I don't just need the latest headline. I need what Reuters is reporting, what Bloomberg analysts are saying on YouTube, what the "War on the Rocks" podcast covered this week, whether defense contractors filed any 8-Ks, and what academic research says about geopolitical risk.

That's 5 different data sources. Previously, 5 different APIs, 5 different auth systems, 500+ lines of integration code.

I replaced all of it with one package.

pip install trawl-sdk

The Architecture

The bot has two layers, each using different Trawl sources:

Real-time layer (polls every 5-30 minutes):

  • News via client.news.search() — breaking headlines in 65+ languages
  • SEC Filings via client.filings.search() — material event disclosures
  • Earnings via client.earnings.search() — management tone and guidance

Background layer (refreshes every 2-24 hours):

  • YouTube via client.search.youtube() + client.transcripts.preview() — expert consensus
  • Podcasts via client.search.podcasts() — deep macro analysis

The real-time layer tells the bot what just happened. The background layer tells it how to interpret it.

The Iran Ceasefire Backtest

I ran the full pipeline against live Trawl data. Here's what one cycle pulled:

News: 15 articles, 58K chars

Try it — search for breaking news on any geopolitical topic:

from trawl import TrawlClient
client = TrawlClient()

queries = ["Iran ceasefire", "US Iran war", "Trump Iran", "Strait of Hormuz"]

for query in queries:
    results = client.news.search(query, max_results=5)
    for article in results["articles"][:3]:
        text = client.news.get_article_text(article["url"])
        print(f"[{article['source_name']}] {article['title'][:60]}")
        print(f"  {len(text['text']):,} chars extracted")

Results:

  • Breitbart: "President Trump: Iran Has Asked for a Ceasefire" (3,684 chars)
  • Foreign Policy: "What a U.S. Operation to Get Iran's Uranium Would Look Like" (9,290 chars)
  • Al Jazeera: "Iran authorities await war 'victory'" (9,162 chars)

YouTube: 9 videos, 486K chars of analyst transcripts

This is the data dimension no other API gave me. Search for analyst videos on any topic:

Then extract the full transcript from any video:

Extract video transcript
curl -X POST "https://api.gettrawl.com/api/transcripts/preview" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://www.youtube.com/watch?v=8jPQjjsBbIc"
}'
videos = client.search.youtube("Iran war analysis 2026", max_results=3)

for video in videos.results:
    url = f"https://youtube.com/watch?v={video.video_id}"
    transcript = client.transcripts.preview(url)
    text = " ".join(s.text for s in transcript.segments)
    print(f"[{video.channel}] {video.title}")
    print(f"  {len(transcript.segments)} segments, {len(text):,} chars")
[ABC News] Trump: Iran War "Nearing Completion"
  220 segments, 8,101 chars

[Bloomberg Television] Trump to Address Nation on Iran War
  724 segments, 36,406 chars

[Hindustan Times] US-Iran War LIVE | Peace Or War?
  8,758 segments, 311,512 chars

311,512 characters from a single live stream. An entire book of expert commentary, timestamped and structured.

SEC Filings: 80 8-K filings from defense contractors

Check what defense contractors are disclosing:

Search SEC filings
curl "https://api.gettrawl.com/api/filings/search?ticker=LMT&form_type=8-K&max_results=5"
for ticker in ["LMT", "RTX", "NOC", "GD"]:
    filings = client.filings.search(ticker=ticker, form_type="8-K")
    print(f"{ticker}: {len(filings['results'])} 8-K filings")

Lockheed Martin, Raytheon, Northrop Grumman, General Dynamics — all with recent material event disclosures. When defense contractors file 8-Ks during an active conflict, that's a signal.

Podcasts + Papers

Search podcasts for deep macro analysis that headlines miss:

And academic research for the long-view context:

# Podcasts
pods = client.search.podcasts("War on the Rocks", max_results=1)
episodes = client.podcasts.episodes(pods.results[0].id, max_results=2)

# Academic papers
papers = client.papers.search("geopolitical risk prediction markets")

War on the Rocks had an episode titled "What It Was Like to Be Under Incoming Fire" — primary-source analysis that headlines miss entirely.

AI Entity Extraction

Try the AI — paste any geopolitical text and extract the key players, organizations, and topics:

AI entity extraction & summarization
curl -X POST "https://api.gettrawl.com/api/ai/preview/summarize" \
  -H "Content-Type: application/json" \
  -d '{
  "text": "President Trump said Iran has asked for a ceasefire through intermediaries. IAEA chief Rafael Grossi confirmed inspectors still have access to declared nuclear sites. Former sanctions negotiator Richard Nephew at Columbia University warned that any deal would need to address enrichment thresholds beyond the original JCPOA framework."
}'
entities = client.ai.preview_entities(combined_text[:30000])
ent = entities["entities"]

print(f"People: {[p['name'] for p in ent['people']]}")
print(f"Organizations: {[o['name'] for o in ent['organizations']]}")
print(f"Topics: {[t['name'] for t in summary['topics']]}")
People: ['Donald Trump', 'Rafael Grossi', 'Richard Nephew']
Organizations: ['United States', 'Iran', 'IAEA', 'Columbia University']
Topics: ['Iran Nuclear Program', 'US Military Operations', 'Middle East Politics']

The AI identified Rafael Grossi (IAEA chief) and Richard Nephew (former sanctions negotiator) as key figures — context a keyword search would miss.

The Numbers

MetricValue
Total data points42
Trawl API calls42
Total content extracted552,717 characters
Sources used5 (news, YouTube, podcasts, SEC filings, papers)
YouTube transcript data486,361 chars (88% of total)
Pipeline execution time~2 minutes

The Headline Scoring Engine

Headlines are scored on 4 dimensions:

market_impact = (
    relevance * 0.35 +      # How related to the market?
    severity * 0.30 +        # Policy decision (1.0) → rumor (0.2)
    novelty * 0.20 +         # New info vs. rehash?
    source_credibility * 0.15 # Reuters (0.95) → blog (0.40)
)

SEC filings get automatic 0.99 credibility (authoritative by definition). The novelty scorer uses embedding cosine similarity against the last 200 headlines.

Real-Time Alerts

Instead of polling, use Trawl's Watch + Webhook system:

client = TrawlClient(api_key="trawl_YOUR_KEY")

webhook = client.webhooks.create(
    url="https://your-bot.com/hooks/trawl",
    events=["watch.content.new"],
)

watch = client.watches.create(
    name="Iran War Monitor",
    source_type="news_keyword",
    source_config={"keywords": ["Iran ceasefire", "Strait of Hormuz"]},
    auto_extract=True,
    poll_interval_minutes=5,
)

client.watches.attach_webhook(watch["id"], webhook["id"])

Trawl pushes events to you — no polling needed.

Automate With Integrations

The pipeline above is a single script. But you can also plug Trawl into multi-agent frameworks, Slack, and Claude directly.

CrewAI — Autonomous Research Agent

from crewai import Agent, Task, Crew
from trawl_crewai import TrawlSearchTool, TrawlNewsTool, TrawlTranscriptTool

researcher = Agent(
    role="Prediction Market Analyst",
    goal="Gather signal from podcasts, news, and YouTube to inform market positions",
    tools=[TrawlSearchTool(), TrawlNewsTool(), TrawlTranscriptTool()],
)

task = Task(
    description="Research the current state of AI regulation — search political podcasts, news, and YouTube for recent commentary. Summarize the consensus view.",
    agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()

The agent decides which sources to hit and in what order. It'll search news first, pull YouTube transcripts for deeper context, then synthesize — without you writing the orchestration logic.

Slack Alerts

If you've added the Trawl Slack app, slash commands give you quick lookups without leaving your trading channel:

/trawl-news AI regulation executive order
/trawl-search political podcast AI policy
/trawl-sentiment NVDA

Pair these with the Watch + Webhook system from earlier to push breaking signals directly into Slack.

MCP — Ask Claude

With the Trawl MCP server configured, you can ask Claude to run the same queries conversationally:

"Search the last week of political podcasts and news for any mentions of AI regulation. What's the consensus — is new legislation likely this quarter?"

Claude calls the Trawl tools behind the scenes — search, transcript extraction, news — and returns a synthesized answer with sources.

Set up MCP →

The Broader Point

This isn't really about prediction markets. It's about the fact that most useful information is locked inside content — videos, podcasts, earnings calls, filings, articles — and accessing it programmatically has been a nightmare of per-source integrations.

Trawl's bet is that a unified content API is the right abstraction. One schema, one SDK, 15+ sources. Whether you're building a trading bot, a compliance monitor, a RAG pipeline, or a research tool — you need the same content, just for different reasons.

The Iran backtest was 42 API calls and 552K characters of structured intelligence. Zero web scraping.

Try It

The full code is on GitHub.

git clone https://github.com/trawlhq/trawl-examples
cd trawl-examples/trading-bot
pip install -r requirements.txt

# Run the Iran backtest
python backtest_iran.py

# Run the full pipeline
python main.py --json

No API keys needed for the backtest.