I Built a 'What Could You Have Known?' Tool That Pulls Every Signal Before Any Market Event
Hindsight is 20/20 — But What If Your Data Was Too?
Every time a stock tanks on earnings, insiders dump shares, or a regulatory action hits — someone says "nobody could have seen that coming." But what if the signals were there all along, scattered across eight different data sources that nobody was looking at simultaneously?
I built a backtester that answers one question: for any ticker and date, what could you have known beforehand? It pulls every pre-event signal from congressional trades, insider filings, SEC disclosures, earnings calls, news, Reddit, lobbying activity, and patent filings — then computes a running conviction score that quantifies the information advantage.
The Signal Sources
The backtester pulls from eight sources in parallel, each contributing to a unified signal timeline:
| Source | Endpoint | Signal Type |
|---|---|---|
| Congressional trades | /api/congress-trading/search | Directional (buy/sell) |
| Insider trades (Form 4) | /api/filings/form4/{ticker} | Directional (buy/sell) |
| SEC filings | /api/filings/search | Informational |
| Earnings calls | /api/earnings/search | Directional (beat/miss) |
| News sentiment | /api/news/search | Directional (keyword NLP) |
| Reddit mentions | /api/reddit/mentions/{ticker} | Directional (sentiment) |
| Lobbying activity | /api/lobbying/search | Thematic |
| Patent filings | /api/patents/search | Long-horizon bullish |
Step 1: Search Congressional Trades
Politicians often trade before material events. Pull their trades for the lookback window.
curl "https://api.gettrawl.com/api/congress-trading/search?ticker=NVDA"Step 2: Pull Insider Transactions
Form 4 filings show C-suite buying and selling. CEO and CFO trades carry extra weight.
curl "https://api.gettrawl.com/api/filings/form4/NVDA?ticker=NVDA"Step 3: Check News Sentiment
News articles are noisy but timely. Simple keyword matching classifies sentiment.
curl "https://api.gettrawl.com/api/news/search?q=NVDA"Step 4: Reddit Mentions
Retail sentiment on Reddit is noisy but occasionally captures signals that institutional research misses.
curl "https://api.gettrawl.com/api/reddit/mentions/NVDA?ticker=NVDA"The Conviction Score Algorithm
Each signal contributes to a running conviction score using four factors:
contribution = direction * strength * source_weight * recency_factor
Direction: +1 (bullish), -1 (bearish), 0 (neutral). Determined by transaction type (buy/sell), keyword sentiment (upgrade/downgrade), or explicit sentiment labels.
Strength: 0.0-1.0 based on source-specific heuristics. C-suite insider sells score higher than random director trades. 8-K filings score higher than routine 10-Q filings.
Source Weight: Historical reliability of the source as an alpha signal:
| Source | Weight | Rationale |
|---|---|---|
| Congress | 0.90 | Proven alpha in academic literature |
| Insider | 0.85 | Classic Seyhun-style signal |
| Earnings | 0.70 | Forward-looking guidance matters |
| News | 0.60 | Noisy but timely |
| SEC | 0.50 | Informational, rarely directional alone |
| Lobbying | 0.40 | Slow-moving, thematic |
| Patents | 0.35 | Long-horizon signal |
| 0.30 | Very noisy, occasional alpha |
Recency Factor: Signals closer to the event weigh more. Range: 0.5 (earliest in lookback window) to 1.0 (day of event). Formula: 1 - (days_before / lookback_days) * 0.5.
The final conviction score accumulates additively across all signals. Interpretation:
> +2.0— Strongly Bullish> +0.5— Mildly Bullish> -0.5— Neutral / Mixed> -2.0— Mildly Bearish<= -2.0— Strongly Bearish
The Full Pipeline
from trawl import TrawlClient
from datetime import datetime, timedelta
client = TrawlClient()
SOURCE_WEIGHTS = {
"congress": 0.9, "insider": 0.85, "sec": 0.5,
"earnings": 0.7, "news": 0.6, "reddit": 0.3,
"lobbying": 0.4, "patents": 0.35,
}
def backtest(ticker: str, event_date: str, lookback_days: int = 30):
"""Pull all pre-event signals and compute conviction."""
ed = datetime.strptime(event_date, "%Y-%m-%d")
start = (ed - timedelta(days=lookback_days)).strftime("%Y-%m-%d")
# Fetch all sources (runs in parallel with trawl-sdk)
trades = client.congress_trading.search(ticker=ticker)
insider = client.filings.form4(ticker=ticker)
filings = client.filings.search(ticker=ticker)
earnings = client.earnings.search(ticker=ticker)
news = client.news.search(q=ticker)
reddit = client.reddit.mentions(ticker=ticker)
lobbying = client.lobbying.search(client_name=ticker)
patents = client.patents.search(assignee=ticker)
# Build signal timeline
signals = []
# Congressional trades
for t in (trades.results or []):
if start <= str(t.transaction_date) <= event_date:
direction = 1 if "purchase" in (t.type or "").lower() else -1
signals.append({
"date": str(t.transaction_date),
"source": "congress",
"description": f"{t.politician} {t.type}",
"direction": direction,
"strength": 0.8,
})
# Insider trades (C-suite gets bonus)
for t in (insider.results or []):
if start <= str(t.date) <= event_date:
direction = 1 if "purchase" in (t.type or "").lower() else -1
strength = 0.95 if any(
x in (t.title or "").upper()
for x in ("CEO", "CFO", "COO", "CTO")
) else 0.75
signals.append({
"date": str(t.date),
"source": "insider",
"description": f"{t.insider_name} ({t.title}) {t.type}",
"direction": direction,
"strength": strength,
})
# Sort chronologically and compute conviction
signals.sort(key=lambda s: s["date"])
conviction = 0.0
for sig in signals:
days_before = (ed - datetime.strptime(sig["date"], "%Y-%m-%d")).days
recency = 1.0 - (days_before / lookback_days) * 0.5
weight = SOURCE_WEIGHTS.get(sig["source"], 0.5)
contribution = sig["direction"] * sig["strength"] * weight * recency
conviction += contribution
print(f" {sig['date']} {sig['source']:10s} {sig['description']}")
print(f" conviction: {conviction:+.2f}")
# Interpret
if conviction > 2.0:
label = "Strongly Bullish"
elif conviction > 0.5:
label = "Mildly Bullish"
elif conviction > -0.5:
label = "Neutral / Mixed"
elif conviction > -2.0:
label = "Mildly Bearish"
else:
label = "Strongly Bearish"
print(f"\nFinal conviction: {conviction:+.2f} ({label})")
print(f"Signals found: {len(signals)} across {len(set(s['source'] for s in signals))} sources")
return conviction
# Example: What could you have known about NVDA before Jan 15?
backtest("NVDA", "2025-01-15", lookback_days=30)
Why This Matters
This is the killer demo for alternative data: proving the information advantage existed, you just needed to look. Markets are informationally inefficient in the short term. Signals from congressional trades, insider filings, lobbying disclosures, and patent activity often telegraph major events days or weeks before they happen. The problem is that these signals are scattered across dozens of sources and formats.
Trawl unifies them into a single API. This backtester shows what a unified signal timeline would have looked like before any given event. It's the difference between "nobody could have seen that coming" and "the data was there — across eight sources, all pointing the same direction."
Non-Code Options
Trawl's MCP server handles this naturally:
"For NVDA, pull all available signals from the 30 days before January 15, 2025 — congressional trades, insider selling, SEC filings, earnings, news, Reddit, lobbying, and patents. Build a timeline and tell me what the conviction looked like."
Claude fetches from all eight sources, builds the chronological signal timeline, and computes the conviction score. No code required.
Source Code
The full implementation with parallel fetching, Rich terminal output, and comprehensive scoring: