Technical Methodology

NEXUS Alpha Signal Generation

Version 1.0 Published 2026-05-21 Coverage: 85 US Equities Validated: 2022–2025

1. Overview

NEXUS Alpha generates long-only investment signals (BUY, OVERWEIGHT, NEUTRAL) by applying FinBERT-based sentiment analysis to SEC EDGAR filings — 10-K annual reports, 10-Q quarterlies, and 8-K earnings releases — supplemented by large language model enrichment. Signals are produced within 10 minutes of a new filing appearing on EDGAR.

Core thesis: Management language in regulatory filings contains predictive information about near-term stock performance. Positive sentiment in MD&A sections and earnings releases correlates with 60-day forward returns above the S&P 500. This edge persists after controlling for sector and market regime.

2. Data Sources

3. Signal Pipeline

SEC EDGAR RSS
Filing Download
Text Extraction
FinBERT Scoring
LLM Enrichment
Signal Engine
DB + API + Alert

3.1 Text Extraction

Raw HTML from EDGAR is parsed using beautifulsoup4. For 10-K/10-Q, the MD&A section is isolated using regex pattern matching on Item 7 headers. For 8-K, Item 2.02 and Item 7.01 are extracted. Maximum input to NLP is 15,000 characters to stay within FinBERT's effective context window.

3.2 FinBERT Sentiment Scoring

ProsusAI/finbert — a BERT model fine-tuned on 10,000 financial sentences from analyst reports and earnings calls. Outputs three class probabilities: Positive, Negative, Neutral. Net sentiment is computed as Positive − Negative, ranging from −1.0 to +1.0.

Long text is split into 512-token chunks, scored independently, and averaged by token count. This preserves sentiment signal from both the beginning and the end of the MD&A, which contain different types of information (historical vs forward-looking).

3.3 LLM Enrichment (Ollama / Claude)

Optional second-stage enrichment generates a structured investment thesis. The prompt asks for: signal direction, confidence, bull case, bear case, key risks, revenue tone (BEAT/MISS/IN-LINE for 8-K), and guidance tone (RAISED/LOWERED/MAINTAINED).

Production uses qwen2.5:14b via Ollama (local, zero marginal cost). The enrichment model override is suppressed if LLM confidence is below 0.60 threshold.

3.4 Signal Engine

Net SentimentTrend AdjustmentRaw Signal
> 0.30BUY
0.10 – 0.30OVERWEIGHT
−0.10 – 0.10NEUTRAL
< −0.10NEUTRAL (long-only mode)

Trend adjustment: The current net sentiment is compared against the rolling average of the prior 4 filings for the same ticker. A positive trend (improving management tone QoQ) upgrades NEUTRAL → OVERWEIGHT. Negative trend is suppressed in long-only mode.

Confidence cap at 0.75: Information coefficient (IC) analysis shows negative correlation between stated confidence and actual 60-day return above 0.75. Signals are capped at 0.75 to avoid over-promising on less reliable high-sentiment readings.

4. Backtest Validation Results

Methodology: Walk-forward validation. Training window: 2 years. Test window: 1 year. Windows rolled forward by 1 year. Signals evaluated at 20, 40, and 60-day forward returns. Benchmark: SPY total return. Universe: 25 US large-cap equities (2022–2025). Long-only BUY and OVERWEIGHT signals only.
1.20
Sharpe (60d)
70.6%
Win Rate vs SPY
+2.45%
Avg Excess vs SPY
+4.79%
Avg Signal Return
−20%
Max Drawdown
34
Total Signals

4.1 Alpha Decay by Hold Period

Hold PeriodAvg Returnvs SPYWin RateSharpeMax DD
20 days+0.61%−0.11%58.8%0.35−27.5%
40 days+3.63%+1.34%58.8%1.13−15.4%
60 days+4.79%+2.45%70.6%1.20−20.0%

Key finding: The signal edge concentrates at longer holding periods. 20-day returns do not significantly beat SPY. The optimal holding period is 60 days, aligning with the quarterly reporting cycle.

4.2 Win Rate by Confidence Band (60-day hold)

Confidence BandSignal CountAvg ReturnWin Rate
0.65 – 0.7010+12.3%90%
0.70 – 0.8020+1.6%60%
> 0.80 (pre-cap historical)4+2.1%75%

Note: The confidence cap at 0.75 was applied after this backtest. The >0.80 bucket reflects historical signals generated before the cap; no live signals will exceed 0.75.

Critical finding: The 0.65–0.70 confidence band dramatically outperforms higher confidence signals. IC (information coefficient) between stated confidence and 60-day return is −0.36 — a strong negative correlation. Higher confidence correlates with more extreme FinBERT readings, which occur on filings with boilerplate or sector-specific language that inflates sentiment artificially. The signal engine applies a hard cap at 0.75 and clients are advised to weight the 0.65–0.70 band most heavily.

4.3 By Signal Type (60-day hold)

SignalCountAvg ReturnWin Rate
BUY4+2.1%75%
OVERWEIGHT30+5.1%70%

OVERWEIGHT signals — generated at moderate positive sentiment (net 0.10–0.30) with positive trend — comprise the majority of actionable signals and carry the strongest risk-adjusted return.

5. Long-Only Mode — Why SELL Signals Are Suppressed

The backtest evaluated long-short mode (including SELL and UNDERWEIGHT signals). Long-short Sharpe was −0.22 vs long-only Sharpe of +1.20. SELL signals on large-cap US equities systematically underperform due to:

SELL and UNDERWEIGHT signals are generated internally but not exposed via the API in long-only mode (default for all tiers). Long-short mode is available on the Institutional tier upon request with a separate risk disclosure agreement.

6. Live Signal Delivery

Signals are delivered via three channels simultaneously upon generation:

6.1 Latency SLA

Target: < 10 minutes from EDGAR filing acceptance to signal delivery. This is measured and published at /status. EDGAR updates RSS feeds every 10 minutes. Our watcher polls every 5 minutes, triggering the full NLP pipeline on first match.

7. Known Limitations & Risks

HIGH
Small sample size: 34 signals across 25 tickers over 3 years. Statistical significance is limited. The walk-forward methodology mitigates overfitting but does not eliminate it. More signals accumulate as universe expands to 85 tickers.
HIGH
Regime dependency: The backtest covers 2022–2025 — a period of rate rises, tech correction, and recovery. Performance in different macro regimes (deflationary, crisis, low-vol) is untested.
MED
Language model bias: FinBERT was trained on analyst reports, not MD&A sections specifically. Sector-specific boilerplate (pharma risk disclosures, energy regulatory language) may systematically skew scores.
MED
Filing timing: Some 10-Q filings lag the end of the quarter by 40–45 days. The signal may already be priced in by the time the filing is public. 8-K earnings releases (filed within 4 days of earnings) are the primary real-time signal source.
MED
Execution cost: Backtest returns do not include transaction costs, slippage, or market impact. For large allocations, these costs may materially reduce net returns.
HIGH
Signal overlay, not a portfolio: NEXUS Alpha generates ~40 signals/year on an 85-ticker universe. Capital is not continuously deployed. The per-signal excess return (+2.45% vs SPY at 60d) reflects information edge on each individual trade. A fully-compounded equal-weight backsimulation shows +9.5% total return vs SPY +26.2% over 2022–2025 — portfolio-level returns lag SPY because (a) only a subset of tickers generate signals in any quarter, and (b) the 2022 bear market hit concentrated filing-season entries. Institutional clients use these signals as overlays on existing book positions, not as a standalone fund.

8. Data Lineage & Audit Trail

LayerSourceImmutability
Raw filingsSEC EDGAR HTTPSCached on ingestion; hash-verified
NLP scoresFinBERT inferenceWritten once, never overwritten
SignalsSignal engineUnique constraint on (ticker, signal_date); upserts only
Paper trackLive pipelineAppend-only; generated_at is immutable; no backdating
API key usageEvery API callAppend-only audit log in api_usage table

9. Key API Endpoints

EndpointTierDescription
GET /signals/top1+Highest-confidence BUY/OVERWEIGHT signals
GET /signals/history1+Full filterable signal time series
GET /signals/watchlist1+NEUTRAL signals — monitoring list
GET /analytics/performance1+Backtest stats + confidence bucket breakdown
GET /filings/{ticker}2+Raw sentiment history per ticker
GET /data/export2+Bulk CSV download — all signals
GET /universe3Full universe snapshot — latest signal per ticker
GET /paper-track1+Live paper track record with SPY comparison
WS /ws/signals1+Real-time signal push via WebSocket

10. Legal Disclaimer

NEXUS Alpha signals are generated by automated NLP models and are provided for informational and research purposes only. They do not constitute investment advice, a recommendation to buy or sell any security, or an offer of any investment product.

NEXUS Alpha is not a registered investment adviser under the Investment Advisers Act of 1940, nor is it registered with any equivalent regulatory body in any other jurisdiction. Past signal performance does not guarantee future results.

Users of NEXUS Alpha signals are solely responsible for their own investment decisions. NEXUS Alpha and its operators accept no liability for any losses arising from the use of these signals. By accessing the API, you confirm you have read and agree to the Terms of Service.