I Built a Compliance Monitor That Tracks Any Stock Across 4 Data Sources — In 50 Lines of Python
I was building a prediction market trading bot. The bot needed to understand what was happening around any company — not just headlines, but SEC filings, earnings call transcripts, YouTube analyst reactions, and news coverage. All cross-referenced, all in real-time.
Here's what that used to look like:
- GDELT API for news monitoring (300 lines of query building, response parsing, deduplication)
- feedparser for RSS feeds (200 lines of polling logic)
- httpx + regex for article text extraction (120 lines of HTML scraping that broke every time a site changed)
- YouTube Data API for video search (separate API key, separate rate limits)
- SEC EDGAR API for filings (another integration, another auth pattern)
That's 500+ lines of glue code across 5 different APIs, each with their own auth, rate limits, response formats, and failure modes.
Then I found Trawl.
The Replacement
pip install trawl-sdk
from trawl import TrawlClient
client = TrawlClient() # No API key needed
That's the setup. Here's what it replaces.
Before: 120 Lines of HTML Scraping
This is actual production code from our trading bot. It fetches article text from news URLs:
# article_fetcher.py — the old way (120 lines, abbreviated)
BLOCKED_DOMAINS = frozenset({"wsj.com", "ft.com", "bloomberg.com"})
_HEADERS = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)..."
}
async def fetch(self, url):
if _is_blocked(url):
return None
async with httpx.AsyncClient(timeout=10, headers=_HEADERS) as client:
resp = await client.get(url)
# Remove scripts and styles
text = re.sub(r"<script[^>]*>.*?</script>", "", resp.text)
text = re.sub(r"<style[^>]*>.*?</style>", "", text)
# Try to find article content
article_match = re.search(r"<article[^>]*>(.*?)</article>", text)
if article_match:
text = article_match.group(1)
# Extract paragraphs
paragraphs = re.findall(r"<p[^>]*>(.*?)</p>", text)
text = "\n\n".join(paragraphs)
# Remove remaining HTML tags
text = re.sub(r"<[^>]+>", "", text)
# Decode entities
text = text.replace("&", "&").replace("<", "<")
return text[:3000]
This broke regularly. Sites change their HTML. Paywalls appear. New domains need blocklisting.
After: 1 Line
result = client.news.get_article_text(url)
text = result["text"] # Clean article text, no HTML, no regex
That's it. 120 lines of fragile code replaced by one SDK call.
Try It Live — Explore Any Ticker
Before diving into the code, explore the data yourself. Change the ticker and hit Run:
curl "https://api.gettrawl.com/api/filings/search?ticker=NVDA&form_type=&max_results=5"curl "https://api.gettrawl.com/api/news/search?q=NVDA+earnings+guidance&max_results=5"curl -X POST "https://api.gettrawl.com/api/ai/preview/summarize" \
-H "Content-Type: application/json" \
-d '{
"text": "NVIDIA reported Q4 revenue of $22.1 billion, up 265% YoY. Data center revenue hit $18.4 billion. CEO Jensen Huang highlighted inference demand as the next growth driver. The company announced new Blackwell GPU architecture."
}'The Full Compliance Monitor
Here's the complete monitor. It takes a stock ticker and pulls intelligence from 5 sources:
from trawl import TrawlClient
def scan_company(ticker):
client = TrawlClient()
# 1. SEC Filings — material event disclosures
filings = client.filings.search(ticker=ticker)
for f in filings["results"][:3]:
print(f"[SEC] {f['form_type']} — {f['description']}")
# 2. News — GDELT + NewsAPI + RSS, deduplicated
news = client.news.search(ticker, max_results=5)
for a in news["articles"][:3]:
print(f"[{a['source_name']}] {a['title'][:60]}")
# 3. YouTube — analyst commentary with transcripts
videos = client.search.youtube(f"{ticker} stock analysis", max_results=3)
for v in videos.results:
transcript = client.transcripts.preview(
f"https://youtube.com/watch?v={v.video_id}"
)
text = " ".join(s.text for s in transcript.segments)
print(f"[{v.channel}] {v.title[:50]} ({len(text):,} chars)")
# 4. Earnings — speaker-segmented transcripts
earnings = client.earnings.search(ticker)
if earnings["results"]:
latest = earnings["results"][0]
t = client.earnings.get_transcript(ticker, latest["year"], latest["quarter"])
print(f"[EARNINGS] Q{latest['quarter']} {latest['year']} — {len(t['participants'])} speakers")
# 5. Academic papers
papers = client.papers.search(ticker, max_results=3)
for p in papers["results"]:
print(f"[PAPER] {p['title'][:60]}")
scan_company("NVDA")
~15 Trawl API calls. Under a minute. 5 data sources. Zero web scraping.
Adding AI Analysis
Trawl has built-in AI endpoints that work on raw text — no auth required:
# Combine all content we gathered
combined = "\n".join(all_texts)
# Summarize everything
summary = client.ai.preview_summarize(combined)
print(f"Summary: {summary['summary']}")
print(f"Topics: {[t['name'] for t in summary['topics']]}")
# Extract entities and stock tickers
entities = client.ai.preview_entities(combined)
ent = entities["entities"]
print(f"People: {[p['name'] for p in ent['people']]}")
print(f"Tickers mentioned: {[t['symbol'] for t in ent['tickers']]}")
The AI layer identified related tickers automatically — if you're scanning NVDA, it finds mentions of AMD, TSLA, and META in the analyst coverage.
Set Up Real-Time Alerts
Don't want to poll manually? Use Trawl's Watch Subscriptions + Webhooks:
client = TrawlClient(api_key="trawl_YOUR_KEY")
# Create a webhook endpoint
webhook = client.webhooks.create(
url="https://your-server.com/hooks/trawl",
events=["watch.content.new"],
)
# Monitor a ticker — Trawl polls and pushes to you
watch = client.watches.create(
name="NVDA Compliance Watch",
source_type="sec_ticker",
source_config={"ticker": "NVDA"},
auto_extract=True,
poll_interval_minutes=15,
)
# Connect them
client.watches.attach_webhook(watch["id"], webhook["id"])
Now Trawl checks for new SEC filings every 15 minutes and pushes to your webhook with an HMAC-SHA256 signed payload.
What This Replaced
| Before (Custom Code) | Lines | After (Trawl SDK) |
|---|---|---|
article_fetcher.py — regex HTML scraping | 120 | client.news.get_article_text(url) |
gdelt_monitor.py — GDELT API wrapper | 300+ | client.news.search(query) |
rss_monitor.py — RSS feed polling | 200+ | Included in news.search() |
| YouTube transcript extraction | N/A | client.transcripts.preview(url) |
| SEC filing search | Separate integration | client.filings.search(ticker) |
| Entity extraction | N/A | client.ai.preview_entities(text) |
500+ lines of custom code replaced with ~50 lines of SDK calls.
Integrate With Your Workflow
No-Code Automation (Make / Zapier)
Set up a Make.com scenario that checks SEC filings daily and sends alerts to Slack:
- Trigger: Schedule (daily at 8am)
- Module: Trawl "Search Filings" → ticker: AAPL, form_type: 8-K
- Filter: Only new filings since yesterday
- Action: Send to Slack channel #compliance-alerts
Same flow works in Zapier — connect Trawl to 7,000+ apps.
Slack Bot
/trawl-intelligence AAPL
/trawl-earnings MSFT
/trawl-news SEC enforcement action
Obsidian
Pull intelligence reports directly into your vault:
Trawl: Pull Intelligence Report → AAPL Creates:
Trawl Reports/AAPL Intelligence Report.md
Every report includes earnings, insider trades, SEC filings, congressional trading, and news — formatted as searchable Markdown with YAML frontmatter.
MCP
"Check if there have been any Form 4 insider transactions for AAPL in the last 30 days, and cross-reference with congressional trading activity."
Try It
pip install trawl-sdk
python -c "
from trawl import TrawlClient
client = TrawlClient()
news = client.news.search('NVDA', max_results=3)
for a in news['articles']:
print(f\"[{a.get('source_name', '')}] {a['title'][:60]}\")
"
No API key. No signup. It just works.
The full code is on GitHub. There's also a Streamlit dashboard and a Marimo notebook you can run.