Ground Truth Verification Report

External verification of the systemic risk channel scoring pipeline. All data checked against live APIs and public sources outside the pipeline.

Papers Verified
2,433
via OpenAlex API
Fabricated Papers
0
100% exist
Citation Anomalies
0
current ≥ stored
Crisis Events
25/25
sourced or plausible
Irrelevant Top-Cited
5+
biomedical in top 10

Verification Summary

LevelCheckVerdictKey Evidence
1Paper Existence GREEN 2,433/2,433 found in OpenAlex (100%)
2Citation Accuracy GREEN current_citations ≥ stored_citations for all 2,433
3Crisis Loss Figures GREEN 21 sourced, 2 plausible, 0 unsourced, 4 undetermined correct
4Search Relevance RED Biomedical papers with 5,000-12,000 citations in financial channels
5Channel Assignment Quality YELLOW Keyword heuristic is coarse but flags real issues in top-10
Contents
  1. The Trust Problem
  2. Paper Existence Verification (Level 1)
  3. Citation Accuracy (Level 2)
  4. Crisis Source Attribution (Level 3)
  5. Search Relevance (Level 4)
  6. Channel Assignment Quality (Level 5)
  7. Overall Verdict
  8. Implications for the Paper

1. The Trust Problem

Previous verification confirmed that the HTML dashboard matches the JSON data, which matches recomputation from the raw inputs. That verification is necessary but insufficient: it only proves internal consistency. Every link in the chain was produced by the same pipeline, so if the pipeline ingested garbage, internal consistency merely confirms the garbage is consistent.

This report goes outside the pipeline to external, independent sources.

Pipeline Trust Chain

search_queries.json
14 channel queries
OpenAlex API
live retrieval
openalex_merged.json
2,433 papers
channel_mapper.py
scoring logic
channel_rankings.json
final scores
HTML dashboard

What This Report Verifies Externally

LinkVerification MethodStatus
OpenAlex API → openalex_merged.json Re-queried all 2,433 work IDs against live API VERIFIED
Stored citations vs. current Compared stored_citations to current_citations for all papers VERIFIED
Crisis event losses Cross-referenced 25 events against court filings, regulators, news VERIFIED
search_queries → OpenAlex API Keyword relevance heuristic on returned papers NOISE FOUND
channel_mapper.py scoring Impact assessment of irrelevant papers on scores AFFECTED

2. Paper Existence Verification (Level 1)

VERDICT: All 2,433 papers exist in OpenAlex. Zero fabricated records. Zero API errors.

Every paper ID in the pipeline was queried against the live OpenAlex API (verification took 1,098 seconds — about 18 minutes — covering all 2,433 works).

MetricValue
Total papers checked2,433
Found in OpenAlex2,433 (100%)
Not found0
API errors0
Title matches2,430
Title mismatches3

About the 3 Title Mismatches

Three records have null titles in both the stored data and the API response. These are not real mismatches — they are records where OpenAlex itself has no title metadata. The works exist (they have valid IDs, citations, and DOIs), but the title field is empty. This is a known OpenAlex data-quality edge case, not a fabrication.


3. Citation Accuracy (Level 2)

VERDICT: Zero citation fabrication detected. For all 2,433 papers, current_citations ≥ stored_citations.

If citation counts had been inflated or fabricated, we would expect some papers to show fewer citations in the live API than in our stored data. The opposite is true: citations only grew or stayed the same.

MetricValue
Papers with current ≥ stored2,433 (100%)
Papers with current < stored0
Mean citation growth+2.0
CategoryCountShare
Citations unchanged (delta = 0)1,30553.6%
Citations grew (delta > 0)1,12846.4%

The mean growth of +2.0 citations since archival is consistent with a dataset collected recently: papers continue to accumulate citations over days and weeks. A fabrication scenario would show papers with implausibly high stored counts that the live API cannot confirm. No such anomaly exists.

Interpretation: The pipeline faithfully recorded the citation counts available from OpenAlex at the time of data collection. No inflation, no fabrication, no rounding errors detectable at the individual-paper level.

4. Crisis Source Attribution (Level 3)

VERDICT: All 25 crisis events verified against public sources. 21 SOURCED, 2 PLAUSIBLE, 0 UNSOURCED. 4 events correctly classified as "undetermined."

Each stored loss figure was compared against court filings, bankruptcy proceedings, regulatory reports (OSC, FBI, DOJ, JFSA, SEC, Fed), blockchain analytics (Chainalysis, Elliptic), and financial news (Reuters, Bloomberg, CoinDesk, The Block, Rekt News).

Verdict Distribution

VerdictCountMeaning
SOURCED21Specific credible source confirms the stored figure
PLAUSIBLE2General references support the magnitude; imprecise but reasonable
UNSOURCED0No corroboration found
Full Crisis Events Table (25 events)
Event Date Stored Loss (USD) Reported Range Source Verdict
Mt. Gox Collapse 2014-02 $460M $450M – $480M Tokyo District Court bankruptcy filing; Reuters; DOJ. 850,000 BTC. SOURCED
The DAO Hack 2016-06 $60M $50M – $70M CoinDesk; Ethereum Foundation; Mehar et al. (2019). ~3.6M ETH. SOURCED
Bitfinex Hack 2016-08 $72M $65M – $78M DOJ (Feb 2022); Bitfinex report. 119,756 BTC at Aug 2016 prices. SOURCED
BTC-e Seizure and Shutdown 2017-07 undetermined DOJ indictment of Vinnik (July 2017); FinCEN $110M fine. Reserves never published. SOURCED
QuadrigaCX Collapse 2019-02 $190M $169M – $215M OSC Staff Notice 21-327 (June 2020); Ernst & Young trustee. C$215M liabilities. SOURCED
Black Thursday (COVID Crash) 2020-03 undetermined MakerDAO post-mortem; Klages-Mundt et al. (2021). BTC -50%, market cap -$93B. SOURCED
SushiSwap Vampire Attack 2020-09 undetermined CoinDesk (Sep 2020); DeFi Pulse. ~$1.14B migrated. Competitive, not theft. SOURCED
Iron Finance / TITAN Collapse 2021-06 $2B $1.7B – $2B The Defiant (June 2021); Iron Finance post-mortem. Peak market cap destruction. PLAUSIBLE
Poly Network Hack 2021-08 $611M $600M – $613M Rekt News; Poly Network; SlowMist. 611M across 3 chains. All returned. SOURCED
Wormhole Bridge Hack 2022-02 $326M $320M – $326M Rekt News; Wormhole post-mortem; Jump Trading. 120,000 wETH via sig verification bug. SOURCED
Ronin Bridge Hack 2022-03 $625M $600M – $625M FBI attribution (April 2022); Chainalysis 2023. 173,600 ETH + 25.5M USDC. Lazarus Group. SOURCED
Terra/Luna Collapse 2022-05 $45B $40B – $60B SEC complaint vs Do Kwon (Feb 2023); academic literature; Bloomberg/Reuters. SOURCED
Three Arrows Capital Insolvency 2022-06 $3.5B $3B – $3.5B Reuters; BVI liquidation; Teneo. Genesis ~$2.36B, Voyager ~$650M. SOURCED
Celsius & Voyager Failures 2022-06/07 $5.4B $4.7B – $6B Celsius bankruptcy (SDNY); Voyager bankruptcy (SDNY). Celsius ~$4.7B; Voyager $650M–1.3B. SOURCED
Nomad Bridge Hack 2022-08 $190M $186M – $190M Rekt News; Nomad post-mortem; Chainalysis. Crowd-sourced exploit. SOURCED
Mango Markets Exploit 2022-10 $114M $110M – $117M DOJ indictment of Eisenberg (Dec 2022); SEC complaint. ~$110M per DOJ. SOURCED
FTX Collapse 2022-11 $8.7B $8B – $9.7B DOJ sentencing (March 2024); FTX bankruptcy; John Ray III testimony. ~$8B+ shortfall. SOURCED
Euler Finance Exploit 2023-03 $197M $195M – $197M Rekt News; Euler post-mortem; The Block. ~$197M exploited. All returned. SOURCED
SVB Failure & USDC De-Peg 2023-03 undetermined Fed review (April 2023); Circle $3.3B at SVB; USDC depegged to $0.87. SOURCED
Curve Finance Pool Exploit 2023-07 $62M $52M – $73M Rekt News; Curve; Vyper vulnerability. Multiple pools totaling $62–73M gross. SOURCED
Multichain Bridge Collapse 2023-07 $126M $125M – $130M Rekt News; Multichain; Chainalysis. ~$126M drained after CEO detained. SOURCED
DMM Bitcoin Exchange Hack 2024-05 $305M $300M – $308M Reuters (May 2024); JFSA; DMM Bitcoin. 4,502.9 BTC. FBI: Lazarus Group. SOURCED
WazirX Multi-Sig Exploit 2024-07 $235M $230M – $235M Reuters (July 2024); WazirX; Liminal; Elliptic. SOURCED
Bybit 1.5B Hack 2025-02 $1.5B $1.4B – $1.5B Reuters (Feb 2025); Bybit; FBI; Chainalysis/Elliptic. ~401,347 ETH via Safe Wallet attack. SOURCED
Hyperliquid Whale Manipulation 2025-03 $15M $12M – $17M The Block; CoinDesk (March 2025). ~$4M from ETH whale; ~$10.6M from JELLY. PLAUSIBLE

Notes on "Undetermined" Events

Four events are correctly stored as "undetermined" losses:

EventWhy Undetermined Is Correct
BTC-e SeizureDOJ focused on money laundering; reserves never published; user losses never quantified
Black ThursdayMarket-wide crash; no single-entity loss attribution possible
SushiSwap Vampire AttackCompetitive migration, not theft; no permanent losses
SVB / USDC De-PegTradFi crisis with transient crypto impact; losses not aggregated

5. Search Relevance (Level 4)

FINDING: Highly-cited papers from completely unrelated fields appear in multiple channels. Biomedical, environmental, and general science papers with 5,000–12,000 citations inflate citation impact scores.

An automated keyword heuristic was applied to all 2,433 papers: each paper's title was checked for terms related to its assigned channel (e.g., "bridge," "cross-chain," "interoperability" for bridge_vulnerability). This heuristic is deliberately coarse — a paper can be relevant despite not containing a keyword in its title — but it flags the most obviously misplaced papers.

Per-Channel Relevance Rates

ChannelPapersKeyword-RelevantRateVisual
stablecoin_runs2185223.9%
23.9%
network_contagion2004321.5%
21.5%
composability_risk2003618.0%
18.0%
regulatory_contagion2003517.5%
17.5%
governance_failure2003316.5%
16.5%
oracle_manipulation2082210.6%
10.6%
liquidity_spirals263218.0%
8.0%
information_asymmetry200157.5%
7.5%
liquidation_cascades200147.0%
7.0%
rwa_transmission20094.5%
4.5%
validator_concentration20084.0%
4.0%
gateway_risk34872.0%
2.0%
counterparty_concentration20031.5%
1.5%
bridge_vulnerability20000.0%
0.0%
Caveat: Low keyword-match rates do not necessarily mean all those papers are irrelevant. A paper titled "A Survey of Transfer Learning" will fail a keyword check for "bridge" but could conceivably relate to cross-chain technology if OpenAlex classified it under relevant concepts. However, when the top-10 most-cited papers in a channel are all from biomolecular simulation, nanotechnology, and environmental science, the keyword heuristic is clearly flagging a real problem.

Critical Finding: Top Irrelevant Papers by Citation Count

These papers have the highest citation counts among those flagged as irrelevant to their assigned channels:

#TitleCitationsChannel(s)
1 A comparative risk assessment of burden of disease and injury attributable to 67 risk factors... 11,936 validator_concentration
2 CHARMM: The biomolecular simulation program 8,937 bridge_vulnerability
3 Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases... 7,311 composability_risk
4 Integrative analysis of 111 reference human epigenomes 7,015 validator_concentration
5 Heavy Metal Toxicity and the Environment 6,837 validator_concentration
6 Nano based drug delivery systems: recent developments and future prospects 6,328 bridge_vulnerability
7 ImageJ2: ImageJ for the next generation of scientific image data 6,112 bridge_vulnerability
8 A survey of transfer learning 5,982 bridge_vulnerability
9 Biological properties of extracellular vesicles and their physiological functions 5,761 bridge_vulnerability
10 Cancer nanomedicine: progress, challenges and opportunities 5,475 validator_concentration
Impact: These irrelevant papers inflate the mean-top-10 citation scores used in the citation_impact dimension. For example, bridge_vulnerability is ranked #1 for citation_impact with a mean_top10 of 5,228.8 — but every single paper in its top 10 is from an unrelated field (biomolecular simulation, nanotechnology, image processing, transfer learning).

6. Channel Assignment Quality (Level 5)

FINDING: Irrelevant papers concentrate in channels with broad search terms, significantly affecting citation_impact scores.

The table below shows, for each channel, how many of its top-10 most-cited papers are flagged as irrelevant, and what the mean citation count of that top-10 is. Channels where 10/10 top papers are irrelevant have citation_impact scores entirely driven by noise.

Top-10 Contamination by Channel

Channel Irrelevant in Top-10 Mean Top-10 Citations Severity
bridge_vulnerability 10 / 10 5,228.8 CRITICAL
validator_concentration 10 / 10 5,041.1 CRITICAL
gateway_risk 10 / 10 1,699.5 CRITICAL
counterparty_concentration 10 / 10 1,105.1 CRITICAL
governance_failure 10 / 10 1,213.3 CRITICAL
liquidation_cascades 10 / 10 1,151.3 CRITICAL
oracle_manipulation 10 / 10 686.9 CRITICAL
rwa_transmission 10 / 10 1,148.4 CRITICAL
composability_risk 9 / 10 3,024.5 HIGH
information_asymmetry 9 / 10 927.2 HIGH
liquidity_spirals 9 / 10 1,603.6 HIGH
regulatory_contagion 9 / 10 937.3 HIGH
network_contagion 8 / 10 854.8 MODERATE
stablecoin_runs 8 / 10 161.4 MODERATE
Detailed: bridge_vulnerability top-10 (all irrelevant)
#TitleCitationsField
1CHARMM: The biomolecular simulation program8,937Computational biology
2Nano based drug delivery systems6,328Pharmaceutical science
3ImageJ2: ImageJ for the next generation of scientific image data6,112Image processing
4A survey of transfer learning5,982Machine learning
5Biological properties of extracellular vesicles5,761Cell biology
6Natural products in drug discovery4,666Pharmacology
7Artificial Intelligence (AI): Multidisciplinary perspectives3,760General AI
8The role of hydrogen and fuel cells in the global energy system3,620Energy systems
9Present and Future of Surface-Enhanced Raman Scattering3,596Chemistry/spectroscopy
10Internet of things: Vision, applications and research challenges3,526IoT/computing

Not a single paper in bridge_vulnerability's top-10 relates to blockchain bridges, cross-chain protocols, or interoperability. The mean_top10 of 5,228.8 is entirely noise.

Detailed: validator_concentration top-10 (all irrelevant)
#TitleCitationsField
1A comparative risk assessment of burden of disease (Lancet)11,936Public health
2Integrative analysis of 111 reference human epigenomes7,015Genomics
3Heavy Metal Toxicity and the Environment6,837Environmental toxicology
4Cancer nanomedicine: progress, challenges and opportunities5,475Oncology
5Interaction between microbiota and immunity3,652Immunology
6Deciphering the Liquidity and Credit Crunch 2007-20083,361Finance (but not validator/PoS)
7Parkinson's disease3,327Neurology
8World agriculture towards 2030/20503,142Agricultural economics
9Role of the normal gut microbiota2,928Gastroenterology
10Safeguarding human health in the Anthropocene epoch2,738Environmental health

The Lancet disease burden study alone (11,936 citations) inflates this channel's mean_top10 dramatically. No paper in the top 10 relates to proof-of-stake validators, staking concentration, or consensus mechanisms.

Root Cause

The OpenAlex search queries use broad terms (e.g., "bridge" matches biomedical bridge studies, "concentration" matches environmental toxicology, "validation" matches scientific validation methodologies). The pipeline then sorts by citation count, causing highly-cited papers from large fields (medicine, biology, chemistry) to dominate over niche DeFi/blockchain literature.


7. Overall Verdict

LevelCheckVerdictEvidence
1 Paper Existence GREEN All 2,433 papers found in OpenAlex. Zero fabricated.
2 Citation Accuracy GREEN current_citations ≥ stored_citations for every paper. Mean growth +2.0.
3 Crisis Losses GREEN 21 sourced (court/regulator/blockchain), 2 plausible, 0 unsourced. 4 undetermined correct.
4 Search Relevance RED Biomedical papers (Lancet, CHARMM, epigenomics) with 5,000-12,000 citations in financial channels.
5 Channel Assignment YELLOW Keyword heuristic is coarse but confirms: 8-10 of top-10 papers are irrelevant in most channels.

Bottom Line

The data is REAL but contains NOISE.

The pipeline correctly retrieves and scores papers from OpenAlex — no fabrication, no inflation, no phantom records. Crisis losses are traceable to court filings, regulatory reports, and blockchain analytics.

However, the broad search queries combined with citation-count sorting pull in highly-cited papers from entirely unrelated fields (biomedicine, chemistry, environmental science), inflating the citation_impact scores for channels whose query terms overlap with those disciplines.


8. Implications for the Paper

What is arithmetically correct

The composite scores are arithmetically correct given the data. The scoring formulas (min-max normalization, equal weighting across dimensions) were verified in the methodology report. The code does exactly what it claims.

What the data contains

The data itself contains relevance noise. Papers were retrieved from OpenAlex using search queries, and the API returned results sorted by citation count. For channels with broad query terms, the top results come from large, heavily-cited fields that share vocabulary but not subject matter.

Which dimensions are affected

Scoring DimensionImpactExplanation
citation_impact (mean top-10 citations) HIGH Irrelevant papers with 5,000–12,000 citations dominate the top-10 for multiple channels. bridge_vulnerability's mean_top10 of 5,228.8 is entirely from non-financial papers. validator_concentration's 5,041.1 includes a Lancet study and epigenome research.
literature_volume (paper count) MODERATE Irrelevant papers are counted in the total but do not dominate volume the way they dominate citation counts. A channel with 200 papers where 160 are irrelevant still has "200 papers" — but the volume number is less distorted because every channel has a similar base count.
crisis_evidence NONE Crisis events were manually curated and independently verified against public sources. This dimension is not affected by search relevance noise.
composite_score MODERATE Since citation_impact is one of three equally-weighted dimensions, channels whose citation_impact is inflated by irrelevant papers will have moderately inflated composite scores. The effect is diluted by the other two dimensions.

Channels Most Affected

Likely Over-Ranked (citation_impact inflated)

Relatively More Reliable

Recommendation

For the paper: The composite scores can be presented as a systematic, reproducible ranking — but with a clear caveat that the citation_impact dimension is affected by search noise from broad OpenAlex queries. The pipeline architecture is sound (real data, correct math, verified crisis events), but the search relevance filtering step is the weak link. Future iterations could apply abstract-level NLP filtering or restrict to finance/CS-classified works to reduce noise.

Generated: 2026-03-27 | Source data: ground_truth_all_papers.json (2,433 papers, verified in 1,098s) & ground_truth_crisis_events.json (25 events) | Verification: OpenAlex live API + public regulatory/judicial sources.