I Built a Prediction Market Trading Bot That Trawls 5 Data Sources for Geopolitical Intelligence
There's a prediction market on Polymarket: "Will there be a US-Iran ceasefire in 2026?"
To trade this intelligently, I don't just need the latest headline. I need what Reuters is reporting, what Bloomberg analysts are saying on YouTube, what the "War on the Rocks" podcast covered this week, whether defense contractors filed any 8-Ks, and what academic research says about geopolitical risk.
That's 5 different data sources. Previously, 5 different APIs, 5 different auth systems, 500+ lines of integration code.
I replaced all of it with one package.
pip install trawl-sdk
The Architecture
The bot has two layers, each using different Trawl sources:
Real-time layer (polls every 5-30 minutes):
- News via
client.news.search()— breaking headlines in 65+ languages - SEC Filings via
client.filings.search()— material event disclosures - Earnings via
client.earnings.search()— management tone and guidance
Background layer (refreshes every 2-24 hours):
- YouTube via
client.search.youtube()+client.transcripts.preview()— expert consensus - Podcasts via
client.search.podcasts()— deep macro analysis
The real-time layer tells the bot what just happened. The background layer tells it how to interpret it.
The Iran Ceasefire Backtest
I ran the full pipeline against live Trawl data. Here's what one cycle pulled:
News: 15 articles, 58K chars
Try it — search for breaking news on any geopolitical topic:
curl "https://api.gettrawl.com/api/news/search?q=Iran+ceasefire&max_results=5"from trawl import TrawlClient
client = TrawlClient()
queries = ["Iran ceasefire", "US Iran war", "Trump Iran", "Strait of Hormuz"]
for query in queries:
results = client.news.search(query, max_results=5)
for article in results["articles"][:3]:
text = client.news.get_article_text(article["url"])
print(f"[{article['source_name']}] {article['title'][:60]}")
print(f" {len(text['text']):,} chars extracted")
Results:
- Breitbart: "President Trump: Iran Has Asked for a Ceasefire" (3,684 chars)
- Foreign Policy: "What a U.S. Operation to Get Iran's Uranium Would Look Like" (9,290 chars)
- Al Jazeera: "Iran authorities await war 'victory'" (9,162 chars)
YouTube: 9 videos, 486K chars of analyst transcripts
This is the data dimension no other API gave me. Search for analyst videos on any topic:
curl "https://api.gettrawl.com/api/search?q=Iran+war+analysis+2026&max_results=5"Then extract the full transcript from any video:
curl -X POST "https://api.gettrawl.com/api/transcripts/preview" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.youtube.com/watch?v=8jPQjjsBbIc"
}'videos = client.search.youtube("Iran war analysis 2026", max_results=3)
for video in videos.results:
url = f"https://youtube.com/watch?v={video.video_id}"
transcript = client.transcripts.preview(url)
text = " ".join(s.text for s in transcript.segments)
print(f"[{video.channel}] {video.title}")
print(f" {len(transcript.segments)} segments, {len(text):,} chars")
[ABC News] Trump: Iran War "Nearing Completion"
220 segments, 8,101 chars
[Bloomberg Television] Trump to Address Nation on Iran War
724 segments, 36,406 chars
[Hindustan Times] US-Iran War LIVE | Peace Or War?
8,758 segments, 311,512 chars
311,512 characters from a single live stream. An entire book of expert commentary, timestamped and structured.
SEC Filings: 80 8-K filings from defense contractors
Check what defense contractors are disclosing:
curl "https://api.gettrawl.com/api/filings/search?ticker=LMT&form_type=8-K&max_results=5"for ticker in ["LMT", "RTX", "NOC", "GD"]:
filings = client.filings.search(ticker=ticker, form_type="8-K")
print(f"{ticker}: {len(filings['results'])} 8-K filings")
Lockheed Martin, Raytheon, Northrop Grumman, General Dynamics — all with recent material event disclosures. When defense contractors file 8-Ks during an active conflict, that's a signal.
Podcasts + Papers
Search podcasts for deep macro analysis that headlines miss:
curl "https://api.gettrawl.com/api/podcasts/search?q=War+on+the+Rocks"And academic research for the long-view context:
curl "https://api.gettrawl.com/api/papers/search?q=geopolitical+risk+prediction+markets"# Podcasts
pods = client.search.podcasts("War on the Rocks", max_results=1)
episodes = client.podcasts.episodes(pods.results[0].id, max_results=2)
# Academic papers
papers = client.papers.search("geopolitical risk prediction markets")
War on the Rocks had an episode titled "What It Was Like to Be Under Incoming Fire" — primary-source analysis that headlines miss entirely.
AI Entity Extraction
Try the AI — paste any geopolitical text and extract the key players, organizations, and topics:
curl -X POST "https://api.gettrawl.com/api/ai/preview/summarize" \
-H "Content-Type: application/json" \
-d '{
"text": "President Trump said Iran has asked for a ceasefire through intermediaries. IAEA chief Rafael Grossi confirmed inspectors still have access to declared nuclear sites. Former sanctions negotiator Richard Nephew at Columbia University warned that any deal would need to address enrichment thresholds beyond the original JCPOA framework."
}'entities = client.ai.preview_entities(combined_text[:30000])
ent = entities["entities"]
print(f"People: {[p['name'] for p in ent['people']]}")
print(f"Organizations: {[o['name'] for o in ent['organizations']]}")
print(f"Topics: {[t['name'] for t in summary['topics']]}")
People: ['Donald Trump', 'Rafael Grossi', 'Richard Nephew']
Organizations: ['United States', 'Iran', 'IAEA', 'Columbia University']
Topics: ['Iran Nuclear Program', 'US Military Operations', 'Middle East Politics']
The AI identified Rafael Grossi (IAEA chief) and Richard Nephew (former sanctions negotiator) as key figures — context a keyword search would miss.
The Numbers
| Metric | Value |
|---|---|
| Total data points | 42 |
| Trawl API calls | 42 |
| Total content extracted | 552,717 characters |
| Sources used | 5 (news, YouTube, podcasts, SEC filings, papers) |
| YouTube transcript data | 486,361 chars (88% of total) |
| Pipeline execution time | ~2 minutes |
The Headline Scoring Engine
Headlines are scored on 4 dimensions:
market_impact = (
relevance * 0.35 + # How related to the market?
severity * 0.30 + # Policy decision (1.0) → rumor (0.2)
novelty * 0.20 + # New info vs. rehash?
source_credibility * 0.15 # Reuters (0.95) → blog (0.40)
)
SEC filings get automatic 0.99 credibility (authoritative by definition). The novelty scorer uses embedding cosine similarity against the last 200 headlines.
Real-Time Alerts
Instead of polling, use Trawl's Watch + Webhook system:
client = TrawlClient(api_key="trawl_YOUR_KEY")
webhook = client.webhooks.create(
url="https://your-bot.com/hooks/trawl",
events=["watch.content.new"],
)
watch = client.watches.create(
name="Iran War Monitor",
source_type="news_keyword",
source_config={"keywords": ["Iran ceasefire", "Strait of Hormuz"]},
auto_extract=True,
poll_interval_minutes=5,
)
client.watches.attach_webhook(watch["id"], webhook["id"])
Trawl pushes events to you — no polling needed.
Automate With Integrations
The pipeline above is a single script. But you can also plug Trawl into multi-agent frameworks, Slack, and Claude directly.
CrewAI — Autonomous Research Agent
from crewai import Agent, Task, Crew
from trawl_crewai import TrawlSearchTool, TrawlNewsTool, TrawlTranscriptTool
researcher = Agent(
role="Prediction Market Analyst",
goal="Gather signal from podcasts, news, and YouTube to inform market positions",
tools=[TrawlSearchTool(), TrawlNewsTool(), TrawlTranscriptTool()],
)
task = Task(
description="Research the current state of AI regulation — search political podcasts, news, and YouTube for recent commentary. Summarize the consensus view.",
agent=researcher,
)
crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()
The agent decides which sources to hit and in what order. It'll search news first, pull YouTube transcripts for deeper context, then synthesize — without you writing the orchestration logic.
Slack Alerts
If you've added the Trawl Slack app, slash commands give you quick lookups without leaving your trading channel:
/trawl-news AI regulation executive order
/trawl-search political podcast AI policy
/trawl-sentiment NVDA
Pair these with the Watch + Webhook system from earlier to push breaking signals directly into Slack.
MCP — Ask Claude
With the Trawl MCP server configured, you can ask Claude to run the same queries conversationally:
"Search the last week of political podcasts and news for any mentions of AI regulation. What's the consensus — is new legislation likely this quarter?"
Claude calls the Trawl tools behind the scenes — search, transcript extraction, news — and returns a synthesized answer with sources.
The Broader Point
This isn't really about prediction markets. It's about the fact that most useful information is locked inside content — videos, podcasts, earnings calls, filings, articles — and accessing it programmatically has been a nightmare of per-source integrations.
Trawl's bet is that a unified content API is the right abstraction. One schema, one SDK, 15+ sources. Whether you're building a trading bot, a compliance monitor, a RAG pipeline, or a research tool — you need the same content, just for different reasons.
The Iran backtest was 42 API calls and 552K characters of structured intelligence. Zero web scraping.
Try It
The full code is on GitHub.
git clone https://github.com/trawlhq/trawl-examples
cd trawl-examples/trading-bot
pip install -r requirements.txt
# Run the Iran backtest
python backtest_iran.py
# Run the full pipeline
python main.py --json
No API keys needed for the backtest.