Back to Playbooks
Complianceintermediate
tutorialcompliancepythonuse-case

I Built a Compliance Monitor That Tracks Any Stock Across 4 Data Sources — In 50 Lines of Python

Trawl Team·

I was building a prediction market trading bot. The bot needed to understand what was happening around any company — not just headlines, but SEC filings, earnings call transcripts, YouTube analyst reactions, and news coverage. All cross-referenced, all in real-time.

Here's what that used to look like:

  • GDELT API for news monitoring (300 lines of query building, response parsing, deduplication)
  • feedparser for RSS feeds (200 lines of polling logic)
  • httpx + regex for article text extraction (120 lines of HTML scraping that broke every time a site changed)
  • YouTube Data API for video search (separate API key, separate rate limits)
  • SEC EDGAR API for filings (another integration, another auth pattern)

That's 500+ lines of glue code across 5 different APIs, each with their own auth, rate limits, response formats, and failure modes.

Then I found Trawl.

The Replacement

pip install trawl-sdk
from trawl import TrawlClient
client = TrawlClient()  # No API key needed

That's the setup. Here's what it replaces.

Before: 120 Lines of HTML Scraping

This is actual production code from our trading bot. It fetches article text from news URLs:

# article_fetcher.py — the old way (120 lines, abbreviated)

BLOCKED_DOMAINS = frozenset({"wsj.com", "ft.com", "bloomberg.com"})

_HEADERS = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)..."
}

async def fetch(self, url):
    if _is_blocked(url):
        return None

    async with httpx.AsyncClient(timeout=10, headers=_HEADERS) as client:
        resp = await client.get(url)

    # Remove scripts and styles
    text = re.sub(r"<script[^>]*>.*?</script>", "", resp.text)
    text = re.sub(r"<style[^>]*>.*?</style>", "", text)

    # Try to find article content
    article_match = re.search(r"<article[^>]*>(.*?)</article>", text)
    if article_match:
        text = article_match.group(1)

    # Extract paragraphs
    paragraphs = re.findall(r"<p[^>]*>(.*?)</p>", text)
    text = "\n\n".join(paragraphs)

    # Remove remaining HTML tags
    text = re.sub(r"<[^>]+>", "", text)

    # Decode entities
    text = text.replace("&amp;", "&").replace("&lt;", "<")

    return text[:3000]

This broke regularly. Sites change their HTML. Paywalls appear. New domains need blocklisting.

After: 1 Line

result = client.news.get_article_text(url)
text = result["text"]  # Clean article text, no HTML, no regex

That's it. 120 lines of fragile code replaced by one SDK call.

Try It Live — Explore Any Ticker

Before diving into the code, explore the data yourself. Change the ticker and hit Run:

Search SEC filings
curl "https://api.gettrawl.com/api/filings/search?ticker=NVDA&form_type=&max_results=5"
AI summarization
curl -X POST "https://api.gettrawl.com/api/ai/preview/summarize" \
  -H "Content-Type: application/json" \
  -d '{
  "text": "NVIDIA reported Q4 revenue of $22.1 billion, up 265% YoY. Data center revenue hit $18.4 billion. CEO Jensen Huang highlighted inference demand as the next growth driver. The company announced new Blackwell GPU architecture."
}'

The Full Compliance Monitor

Here's the complete monitor. It takes a stock ticker and pulls intelligence from 5 sources:

from trawl import TrawlClient

def scan_company(ticker):
    client = TrawlClient()

    # 1. SEC Filings — material event disclosures
    filings = client.filings.search(ticker=ticker)
    for f in filings["results"][:3]:
        print(f"[SEC] {f['form_type']}{f['description']}")

    # 2. News — GDELT + NewsAPI + RSS, deduplicated
    news = client.news.search(ticker, max_results=5)
    for a in news["articles"][:3]:
        print(f"[{a['source_name']}] {a['title'][:60]}")

    # 3. YouTube — analyst commentary with transcripts
    videos = client.search.youtube(f"{ticker} stock analysis", max_results=3)
    for v in videos.results:
        transcript = client.transcripts.preview(
            f"https://youtube.com/watch?v={v.video_id}"
        )
        text = " ".join(s.text for s in transcript.segments)
        print(f"[{v.channel}] {v.title[:50]} ({len(text):,} chars)")

    # 4. Earnings — speaker-segmented transcripts
    earnings = client.earnings.search(ticker)
    if earnings["results"]:
        latest = earnings["results"][0]
        t = client.earnings.get_transcript(ticker, latest["year"], latest["quarter"])
        print(f"[EARNINGS] Q{latest['quarter']} {latest['year']}{len(t['participants'])} speakers")

    # 5. Academic papers
    papers = client.papers.search(ticker, max_results=3)
    for p in papers["results"]:
        print(f"[PAPER] {p['title'][:60]}")

scan_company("NVDA")

~15 Trawl API calls. Under a minute. 5 data sources. Zero web scraping.

Adding AI Analysis

Trawl has built-in AI endpoints that work on raw text — no auth required:

# Combine all content we gathered
combined = "\n".join(all_texts)

# Summarize everything
summary = client.ai.preview_summarize(combined)
print(f"Summary: {summary['summary']}")
print(f"Topics: {[t['name'] for t in summary['topics']]}")

# Extract entities and stock tickers
entities = client.ai.preview_entities(combined)
ent = entities["entities"]
print(f"People: {[p['name'] for p in ent['people']]}")
print(f"Tickers mentioned: {[t['symbol'] for t in ent['tickers']]}")

The AI layer identified related tickers automatically — if you're scanning NVDA, it finds mentions of AMD, TSLA, and META in the analyst coverage.

Set Up Real-Time Alerts

Don't want to poll manually? Use Trawl's Watch Subscriptions + Webhooks:

client = TrawlClient(api_key="trawl_YOUR_KEY")

# Create a webhook endpoint
webhook = client.webhooks.create(
    url="https://your-server.com/hooks/trawl",
    events=["watch.content.new"],
)

# Monitor a ticker — Trawl polls and pushes to you
watch = client.watches.create(
    name="NVDA Compliance Watch",
    source_type="sec_ticker",
    source_config={"ticker": "NVDA"},
    auto_extract=True,
    poll_interval_minutes=15,
)

# Connect them
client.watches.attach_webhook(watch["id"], webhook["id"])

Now Trawl checks for new SEC filings every 15 minutes and pushes to your webhook with an HMAC-SHA256 signed payload.

What This Replaced

Before (Custom Code)LinesAfter (Trawl SDK)
article_fetcher.py — regex HTML scraping120client.news.get_article_text(url)
gdelt_monitor.py — GDELT API wrapper300+client.news.search(query)
rss_monitor.py — RSS feed polling200+Included in news.search()
YouTube transcript extractionN/Aclient.transcripts.preview(url)
SEC filing searchSeparate integrationclient.filings.search(ticker)
Entity extractionN/Aclient.ai.preview_entities(text)

500+ lines of custom code replaced with ~50 lines of SDK calls.

Integrate With Your Workflow

No-Code Automation (Make / Zapier)

Set up a Make.com scenario that checks SEC filings daily and sends alerts to Slack:

  1. Trigger: Schedule (daily at 8am)
  2. Module: Trawl "Search Filings" → ticker: AAPL, form_type: 8-K
  3. Filter: Only new filings since yesterday
  4. Action: Send to Slack channel #compliance-alerts

Same flow works in Zapier — connect Trawl to 7,000+ apps.

Slack Bot

/trawl-intelligence AAPL
/trawl-earnings MSFT
/trawl-news SEC enforcement action

Obsidian

Pull intelligence reports directly into your vault:

Trawl: Pull Intelligence Report → AAPL Creates: Trawl Reports/AAPL Intelligence Report.md

Every report includes earnings, insider trades, SEC filings, congressional trading, and news — formatted as searchable Markdown with YAML frontmatter.

MCP

"Check if there have been any Form 4 insider transactions for AAPL in the last 30 days, and cross-reference with congressional trading activity."

Set up MCP →

Try It

pip install trawl-sdk

python -c "
from trawl import TrawlClient
client = TrawlClient()
news = client.news.search('NVDA', max_results=3)
for a in news['articles']:
    print(f\"[{a.get('source_name', '')}] {a['title'][:60]}\")
"

No API key. No signup. It just works.

The full code is on GitHub. There's also a Streamlit dashboard and a Marimo notebook you can run.