System Card · Patent pending · Working papers on SSRN

What’s actually inside,
on the label.

Market Scholar is a machine-learning forensic engine with its own AI credibility model. It’s been validated against the academic theory of how market narratives move prices — and forward-tested in live trading books. Here are the results, with the caveats attached. Every number on this page is reproducible from the system’s own database.

Market Scholar
System Facts
Coverage: 193 US equities · all 11 GICS sectors
Per continuous auditAmount
History under auditDec 2024 –18 mo
Daily forensic scorecards16,527
AI narrative observations669,707
News articles ingested2 corpora~700K
Core scoring cadence~90 jobs6×/day
Measurement layer · validated
Credibility model — test AUCvs 0.500 random0.631
General-LLM baselineFinBERT0.55–0.60
5-day return spreadtop vs bottom+7.62 pts
20-day directional spread+21.1 pts
Shiller propositions supported4 / 4
Independent findings7
Trading layer · forward paper, ~2 mo
Total return — $50K book+21.8%
Total return — $300K book+4.6%
Combined book return+7.1%
Win rate$300K book54.0%
Sharpe ratioannualized2.71
Profit factor1.90
Spreads are walk-forward forward-return differentials. The trading layer is paper-traded, not the forensic verdict. Several raw narrative-state spreads do not survive beta adjustment — see “What we don’t claim.”
How to read this

Two layers. Kept honest and kept apart.

Most “AI for markets” blurs measurement and money until you can’t tell what was actually proven. We don’t. Market Scholar is built as two distinct layers, and we report each on its own terms.

The measurement layer

The forensic engine that scores every company: credibility, filing drift, coordination, decay, fair-value divergence. This is the part that’s been written up as academic working papers and validated against the data — it predicts the structure of a narrative, with effect sizes we report in full.

The trading layer

A separate, signal-driven book — named strategies, each with its own direction and a walk-forward edge grade, sized by realized performance. It does not trade the forensic verdicts directly. This is where the paper P&L is earned, and we judge it on book P&L, not story.

The discipline behind every figure: walk-forward training, time-ordered splits, and observation-time-only features — nothing the model couldn’t have known at the moment it scored.

Its own AI

It runs on language models. It doesn’t think like one.

Off-the-shelf models (Gemini, Claude) do the reading — parsing claims out of filings and articles. But the judgment is ours: a proprietary Narrative Credibility Classifier, retrained nightly on the system’s own outcome-labeled data.

  • 34 observation-time features — speaker authority, market regime, narrative phase, sector momentum
  • Trained on 67,941 outcome-derived labels, walk-forward with a 30-day buffer
  • On the one task that matters — “will the market validate this story?” — it beats general-purpose LLMs
Directional skill · same task class
Market Scholar classifier0.631
test AUC, walk-forward
FinBERT0.55–0.60
fine-tuned financial LLM
GPT-3.5 / GPT-4 as predictorr ≈ 0.05–0.10
Lopez-Lira & Tang, 2023
In an AI-flooded information world, the answer isn’t more LLMs on more text — it’s structure the model can’t read off the page. When a third-party sentiment feed tested at r = −0.020 against forward returns, we ripped it out and built our own keyword model at r = +0.214.
What we’ve proven

The textbook said stories move markets. We measured it.

Robert Shiller won the Nobel and built “Narrative Economics” on four claims — that narratives are quantifiable, decay over time, spread by contagion, and create mispricings that revert. Prior work tested them with Google Trends and word counts. We tested them at the granularity of one company on one day, and found support for all four.

CLAIM 1 Supported

Narratives are quantifiable

187 companies reduced to ~30 daily forensic features each — coordination, filing-drift, decay, credibility — across 12,747 daily classifications and 287,482 narrative observations.

CLAIM 2 Supported

Narratives have measurable lifecycles that decay

Stories in their “dying” phase (5–20% energy remaining) returned +4.01% over 5 days vs +1.18% for full-energy stories (n = 531 vs 6,170). Decay rate predicts returns at r = +0.072.

CLAIM 3 Supported

Narratives spread by contagion

Coordinated, high-drift coverage continued to drift +3.36% over 5 days vs +1.26% control. Themes lead each other on a clock: Edge-AI precedes Quantum by ~5 days (r = +0.61).

CLAIM 4 Supported

Narrative dynamics produce mispricings that revert

Exhausted narratives on moderately undervalued names returned +4.19% (61.7% up); on moderately overvalued names, +0.03% (43.4% up) — a +4.16-point reversion-to-fundamentals gap.

Seven independent findings

Each survives walk-forward, observation-time-only testing across the 17-month validation panel. Effect sizes for the novel narrative findings run r = 0.07–0.14.

01
Credibility classifier
High- vs low-credibility narratives, 5-day forward return
+7.62 pts
n = 146 vs 6,894 · +21.1 pts directional at 20 days
02
Fair-value corridor
Exhausted + undervalued vs exhausted + overvalued
+4.16 pts
n = 154 vs 281 · p < 0.001
03
Coordination → continuation
High-coordination, high-drift coverage, 5-day return
+3.36%
n = 678 vs 4,288 control
04
Sector rotation
Low-energy minus high-energy sectors, days 5–20
+3.03 pts
n = 505 sector-days · z ≈ 4
05
Theme lead-lag
Edge-AI attention precedes Quantum by 5 days
r = +0.61
Defense → Nuclear r = +0.46
06
Fair-value divergence
Pure-fundamentals gap vs forward return
r = +0.139
n = 7,667 · strengthens to 20 days
07
Exhaustion bounce
Dying-phase narratives vs full-energy baseline
+2.83 pts
n = 531 vs 6,170 · p < 0.001

Source: Walsh (2026), “Forensic Narrative Classification and Equity Returns,” Market Prism Working Paper, SSRN preprint. Validation dataset: 187 US equities, 17 months, 287,482 narrative observations, five calibration regimes.

Where it earns its keep

Forward-tested, with money on the line.

The trading layer runs as live paper books mirrored to brokerage paper accounts — real fills, real slippage, no benefit of hindsight. Two books, two months, both net positive: every trade was opened after the model said so.

$300K book

100% forward · no backtest
+4.6%
Total return · ~2 months
$300,000 → $313,885
54.0%
Win rate
+0.79%
Avg / trade
2.71
Sharpe (ann.)
1.90
Profit factor
322 closed trades. Longs +55.6% win, shorts +52.5%. The edge is in selection and sizing — held to book P&L, not a cherry-picked benchmark.

$50K book

Forward / live
+21.8%
Total return · ~2 months
$50,000 → $60,879
53.8%
Win rate
+0.76%
Avg / trade
~600
Closed trades
+$10.9K
Realized P&L
A smaller, more concentrated book — fewer, larger positions, so the percentage return runs higher than the $300K book on the same signals.
Net profitableBoth books combined, ~2 months
$350,000 → $374,764 · +7.1%
What we don’t claim

The caveats are part of the credibility.

A forensic engine that won’t name its own limits isn’t forensic. Here’s what the evidence does not support — stated by us, before anyone else has to.

Measurement, not magic.

Effect sizes for the novel narrative findings run r = 0.07–0.14 (R² of 0.5–2%) — squarely in the range of credible behavioral-finance research, not “physics-grade” prediction.

Some raw spreads are just beta.

In a separate beta-adjusted audit, four raw narrative-state return spreads (coordinated campaigns, rapid decay, energy transitions, high suspicion) shrink to statistical noise once market beta is removed. We publish that correction ourselves.

One market regime.

The validation window (Dec 2024 – May 2026) is a single mixed bull/choppy regime. Effects could attenuate — or invert — in an extended bear market. We say so in the paper.

Forensics ≠ a recommendation.

The scorecard measures how a story holds up against the record. The live trading book is a separate, signal-driven layer. We never sell the verdict as the trade.

The intellectual property

Three patents pending on the methods that make it work

U.S. Provisional 63/971,470

Multi-dimensional analyst & narrative credibility assessment

The credibility classifier — scoring who said what, in which regime, at which phase of which story.

U.S. Provisional 63/971,478

Narrative lifecycle tracking with decay monitoring

The energy-and-decay model: fitting each narrative its own half-life and exhaustion point.

U.S. Provisional 63/971,485

Inference-time temporal methodology

The third filing in the family covering the underlying forensic methodology.

The record of discovery

Every test — including null results and findings that later failed — is written to a timestamped scientific audit log. The working papers are posted as SSRN preprints with their methodology, sample sizes, and limitations in full.

  • Walsh (2026) — Forensic Narrative Classification and Equity Returns
  • Walsh (2026) — Narrative Lifecycle States as Attention & Risk-Loading Regimes
  • JEL classification G12, G14, G17, G40

See the system read a live company

Run a forensic audit on any ticker, or talk to us about deploying Market Scholar across research, compliance, and media intelligence.