Back to Playbooks
Quant Traderadvanced
backtestingevent-drivenalternative datacross-sourcequant

I Built a 'What Could You Have Known?' Tool That Pulls Every Signal Before Any Market Event

Trawl Team·

Hindsight is 20/20 — But What If Your Data Was Too?

Every time a stock tanks on earnings, insiders dump shares, or a regulatory action hits — someone says "nobody could have seen that coming." But what if the signals were there all along, scattered across eight different data sources that nobody was looking at simultaneously?

I built a backtester that answers one question: for any ticker and date, what could you have known beforehand? It pulls every pre-event signal from congressional trades, insider filings, SEC disclosures, earnings calls, news, Reddit, lobbying activity, and patent filings — then computes a running conviction score that quantifies the information advantage.

The Signal Sources

The backtester pulls from eight sources in parallel, each contributing to a unified signal timeline:

SourceEndpointSignal Type
Congressional trades/api/congress-trading/searchDirectional (buy/sell)
Insider trades (Form 4)/api/filings/form4/{ticker}Directional (buy/sell)
SEC filings/api/filings/searchInformational
Earnings calls/api/earnings/searchDirectional (beat/miss)
News sentiment/api/news/searchDirectional (keyword NLP)
Reddit mentions/api/reddit/mentions/{ticker}Directional (sentiment)
Lobbying activity/api/lobbying/searchThematic
Patent filings/api/patents/searchLong-horizon bullish

Step 1: Search Congressional Trades

Politicians often trade before material events. Pull their trades for the lookback window.

Search congressional trades
curl "https://api.gettrawl.com/api/congress-trading/search?ticker=NVDA"

Step 2: Pull Insider Transactions

Form 4 filings show C-suite buying and selling. CEO and CFO trades carry extra weight.

Search Form 4 insider trades
curl "https://api.gettrawl.com/api/filings/form4/NVDA?ticker=NVDA"

Step 3: Check News Sentiment

News articles are noisy but timely. Simple keyword matching classifies sentiment.

Search news articles
curl "https://api.gettrawl.com/api/news/search?q=NVDA"

Step 4: Reddit Mentions

Retail sentiment on Reddit is noisy but occasionally captures signals that institutional research misses.

Search Reddit mentions
curl "https://api.gettrawl.com/api/reddit/mentions/NVDA?ticker=NVDA"

The Conviction Score Algorithm

Each signal contributes to a running conviction score using four factors:

contribution = direction * strength * source_weight * recency_factor

Direction: +1 (bullish), -1 (bearish), 0 (neutral). Determined by transaction type (buy/sell), keyword sentiment (upgrade/downgrade), or explicit sentiment labels.

Strength: 0.0-1.0 based on source-specific heuristics. C-suite insider sells score higher than random director trades. 8-K filings score higher than routine 10-Q filings.

Source Weight: Historical reliability of the source as an alpha signal:

SourceWeightRationale
Congress0.90Proven alpha in academic literature
Insider0.85Classic Seyhun-style signal
Earnings0.70Forward-looking guidance matters
News0.60Noisy but timely
SEC0.50Informational, rarely directional alone
Lobbying0.40Slow-moving, thematic
Patents0.35Long-horizon signal
Reddit0.30Very noisy, occasional alpha

Recency Factor: Signals closer to the event weigh more. Range: 0.5 (earliest in lookback window) to 1.0 (day of event). Formula: 1 - (days_before / lookback_days) * 0.5.

The final conviction score accumulates additively across all signals. Interpretation:

  • > +2.0 — Strongly Bullish
  • > +0.5 — Mildly Bullish
  • > -0.5 — Neutral / Mixed
  • > -2.0 — Mildly Bearish
  • <= -2.0 — Strongly Bearish

The Full Pipeline

from trawl import TrawlClient
from datetime import datetime, timedelta

client = TrawlClient()

SOURCE_WEIGHTS = {
    "congress": 0.9, "insider": 0.85, "sec": 0.5,
    "earnings": 0.7, "news": 0.6, "reddit": 0.3,
    "lobbying": 0.4, "patents": 0.35,
}

def backtest(ticker: str, event_date: str, lookback_days: int = 30):
    """Pull all pre-event signals and compute conviction."""
    ed = datetime.strptime(event_date, "%Y-%m-%d")
    start = (ed - timedelta(days=lookback_days)).strftime("%Y-%m-%d")

    # Fetch all sources (runs in parallel with trawl-sdk)
    trades = client.congress_trading.search(ticker=ticker)
    insider = client.filings.form4(ticker=ticker)
    filings = client.filings.search(ticker=ticker)
    earnings = client.earnings.search(ticker=ticker)
    news = client.news.search(q=ticker)
    reddit = client.reddit.mentions(ticker=ticker)
    lobbying = client.lobbying.search(client_name=ticker)
    patents = client.patents.search(assignee=ticker)

    # Build signal timeline
    signals = []

    # Congressional trades
    for t in (trades.results or []):
        if start <= str(t.transaction_date) <= event_date:
            direction = 1 if "purchase" in (t.type or "").lower() else -1
            signals.append({
                "date": str(t.transaction_date),
                "source": "congress",
                "description": f"{t.politician} {t.type}",
                "direction": direction,
                "strength": 0.8,
            })

    # Insider trades (C-suite gets bonus)
    for t in (insider.results or []):
        if start <= str(t.date) <= event_date:
            direction = 1 if "purchase" in (t.type or "").lower() else -1
            strength = 0.95 if any(
                x in (t.title or "").upper()
                for x in ("CEO", "CFO", "COO", "CTO")
            ) else 0.75
            signals.append({
                "date": str(t.date),
                "source": "insider",
                "description": f"{t.insider_name} ({t.title}) {t.type}",
                "direction": direction,
                "strength": strength,
            })

    # Sort chronologically and compute conviction
    signals.sort(key=lambda s: s["date"])
    conviction = 0.0

    for sig in signals:
        days_before = (ed - datetime.strptime(sig["date"], "%Y-%m-%d")).days
        recency = 1.0 - (days_before / lookback_days) * 0.5
        weight = SOURCE_WEIGHTS.get(sig["source"], 0.5)
        contribution = sig["direction"] * sig["strength"] * weight * recency
        conviction += contribution
        print(f"  {sig['date']}  {sig['source']:10s}  {sig['description']}")
        print(f"             conviction: {conviction:+.2f}")

    # Interpret
    if conviction > 2.0:
        label = "Strongly Bullish"
    elif conviction > 0.5:
        label = "Mildly Bullish"
    elif conviction > -0.5:
        label = "Neutral / Mixed"
    elif conviction > -2.0:
        label = "Mildly Bearish"
    else:
        label = "Strongly Bearish"

    print(f"\nFinal conviction: {conviction:+.2f} ({label})")
    print(f"Signals found: {len(signals)} across {len(set(s['source'] for s in signals))} sources")
    return conviction

# Example: What could you have known about NVDA before Jan 15?
backtest("NVDA", "2025-01-15", lookback_days=30)

Why This Matters

This is the killer demo for alternative data: proving the information advantage existed, you just needed to look. Markets are informationally inefficient in the short term. Signals from congressional trades, insider filings, lobbying disclosures, and patent activity often telegraph major events days or weeks before they happen. The problem is that these signals are scattered across dozens of sources and formats.

Trawl unifies them into a single API. This backtester shows what a unified signal timeline would have looked like before any given event. It's the difference between "nobody could have seen that coming" and "the data was there — across eight sources, all pointing the same direction."

Non-Code Options

Trawl's MCP server handles this naturally:

"For NVDA, pull all available signals from the 30 days before January 15, 2025 — congressional trades, insider selling, SEC filings, earnings, news, Reddit, lobbying, and patents. Build a timeline and tell me what the conviction looked like."

Claude fetches from all eight sources, builds the chronological signal timeline, and computes the conviction score. No code required.

Source Code

The full implementation with parallel fetching, Rich terminal output, and comprehensive scoring:

github.com/trawlhq/examples/tree/master/event-backtester