Ground Truth Verification Report

External verification of the systemic risk channel scoring pipeline. All data checked against live APIs and public sources outside the pipeline.

Papers Verified

2,433

via OpenAlex API

Fabricated Papers

100% exist

Citation Anomalies

current ≥ stored

Crisis Events

25/25

sourced or plausible

Irrelevant Top-Cited

biomedical in top 10

Verification Summary

Level	Check	Verdict	Key Evidence
1	Paper Existence	GREEN	2,433/2,433 found in OpenAlex (100%)
2	Citation Accuracy	GREEN	current_citations ≥ stored_citations for all 2,433
3	Crisis Loss Figures	GREEN	21 sourced, 2 plausible, 0 unsourced, 4 undetermined correct
4	Search Relevance	RED	Biomedical papers with 5,000-12,000 citations in financial channels
5	Channel Assignment Quality	YELLOW	Keyword heuristic is coarse but flags real issues in top-10

Contents

The Trust Problem
Paper Existence Verification (Level 1)
Citation Accuracy (Level 2)
Crisis Source Attribution (Level 3)
Search Relevance (Level 4)
Channel Assignment Quality (Level 5)
Overall Verdict
Implications for the Paper

1. The Trust Problem

Previous verification confirmed that the HTML dashboard matches the JSON data, which matches recomputation from the raw inputs. That verification is necessary but insufficient: it only proves internal consistency. Every link in the chain was produced by the same pipeline, so if the pipeline ingested garbage, internal consistency merely confirms the garbage is consistent.

This report goes outside the pipeline to external, independent sources.

Pipeline Trust Chain

search_queries.json
14 channel queries

→

OpenAlex API
live retrieval

→

openalex_merged.json
2,433 papers

→

channel_mapper.py
scoring logic

→

channel_rankings.json
final scores

→

HTML dashboard

What This Report Verifies Externally

Link	Verification Method	Status
OpenAlex API → openalex_merged.json	Re-queried all 2,433 work IDs against live API	VERIFIED
Stored citations vs. current	Compared stored_citations to current_citations for all papers	VERIFIED
Crisis event losses	Cross-referenced 25 events against court filings, regulators, news	VERIFIED
search_queries → OpenAlex API	Keyword relevance heuristic on returned papers	NOISE FOUND
channel_mapper.py scoring	Impact assessment of irrelevant papers on scores	AFFECTED

2. Paper Existence Verification (Level 1)

VERDICT: All 2,433 papers exist in OpenAlex. Zero fabricated records. Zero API errors.

Every paper ID in the pipeline was queried against the live OpenAlex API (verification took 1,098 seconds — about 18 minutes — covering all 2,433 works).

Metric	Value
Total papers checked	2,433
Found in OpenAlex	2,433 (100%)
Not found	0
API errors	0
Title matches	2,430
Title mismatches	3

About the 3 Title Mismatches

Three records have null titles in both the stored data and the API response. These are not real mismatches — they are records where OpenAlex itself has no title metadata. The works exist (they have valid IDs, citations, and DOIs), but the title field is empty. This is a known OpenAlex data-quality edge case, not a fabrication.

3. Citation Accuracy (Level 2)

VERDICT: Zero citation fabrication detected. For all 2,433 papers, current_citations ≥ stored_citations.

If citation counts had been inflated or fabricated, we would expect some papers to show fewer citations in the live API than in our stored data. The opposite is true: citations only grew or stayed the same.

Metric	Value
Papers with current ≥ stored	2,433 (100%)
Papers with current < stored	0
Mean citation growth	+2.0

Category	Count	Share
Citations unchanged (delta = 0)	1,305	53.6%
Citations grew (delta > 0)	1,128	46.4%

The mean growth of +2.0 citations since archival is consistent with a dataset collected recently: papers continue to accumulate citations over days and weeks. A fabrication scenario would show papers with implausibly high stored counts that the live API cannot confirm. No such anomaly exists.

Interpretation: The pipeline faithfully recorded the citation counts available from OpenAlex at the time of data collection. No inflation, no fabrication, no rounding errors detectable at the individual-paper level.

4. Crisis Source Attribution (Level 3)

VERDICT: All 25 crisis events verified against public sources. 21 SOURCED, 2 PLAUSIBLE, 0 UNSOURCED. 4 events correctly classified as "undetermined."

Each stored loss figure was compared against court filings, bankruptcy proceedings, regulatory reports (OSC, FBI, DOJ, JFSA, SEC, Fed), blockchain analytics (Chainalysis, Elliptic), and financial news (Reuters, Bloomberg, CoinDesk, The Block, Rekt News).

Verdict Distribution

Verdict	Count	Meaning
SOURCED	21	Specific credible source confirms the stored figure
PLAUSIBLE	2	General references support the magnitude; imprecise but reasonable
UNSOURCED	0	No corroboration found

Full Crisis Events Table (25 events)

Event	Date	Stored Loss (USD)	Reported Range	Source	Verdict
Mt. Gox Collapse	2014-02	$460M	$450M – $480M	Tokyo District Court bankruptcy filing; Reuters; DOJ. 850,000 BTC.	SOURCED
The DAO Hack	2016-06	$60M	$50M – $70M	CoinDesk; Ethereum Foundation; Mehar et al. (2019). ~3.6M ETH.	SOURCED
Bitfinex Hack	2016-08	$72M	$65M – $78M	DOJ (Feb 2022); Bitfinex report. 119,756 BTC at Aug 2016 prices.	SOURCED
BTC-e Seizure and Shutdown	2017-07	undetermined	—	DOJ indictment of Vinnik (July 2017); FinCEN $110M fine. Reserves never published.	SOURCED
QuadrigaCX Collapse	2019-02	$190M	$169M – $215M	OSC Staff Notice 21-327 (June 2020); Ernst & Young trustee. C$215M liabilities.	SOURCED
Black Thursday (COVID Crash)	2020-03	undetermined	—	MakerDAO post-mortem; Klages-Mundt et al. (2021). BTC -50%, market cap -$93B.	SOURCED
SushiSwap Vampire Attack	2020-09	undetermined	—	CoinDesk (Sep 2020); DeFi Pulse. ~$1.14B migrated. Competitive, not theft.	SOURCED
Iron Finance / TITAN Collapse	2021-06	$2B	$1.7B – $2B	The Defiant (June 2021); Iron Finance post-mortem. Peak market cap destruction.	PLAUSIBLE
Poly Network Hack	2021-08	$611M	$600M – $613M	Rekt News; Poly Network; SlowMist. 611M across 3 chains. All returned.	SOURCED
Wormhole Bridge Hack	2022-02	$326M	$320M – $326M	Rekt News; Wormhole post-mortem; Jump Trading. 120,000 wETH via sig verification bug.	SOURCED
Ronin Bridge Hack	2022-03	$625M	$600M – $625M	FBI attribution (April 2022); Chainalysis 2023. 173,600 ETH + 25.5M USDC. Lazarus Group.	SOURCED
Terra/Luna Collapse	2022-05	$45B	$40B – $60B	SEC complaint vs Do Kwon (Feb 2023); academic literature; Bloomberg/Reuters.	SOURCED
Three Arrows Capital Insolvency	2022-06	$3.5B	$3B – $3.5B	Reuters; BVI liquidation; Teneo. Genesis ~$2.36B, Voyager ~$650M.	SOURCED
Celsius & Voyager Failures	2022-06/07	$5.4B	$4.7B – $6B	Celsius bankruptcy (SDNY); Voyager bankruptcy (SDNY). Celsius ~$4.7B; Voyager $650M–1.3B.	SOURCED
Nomad Bridge Hack	2022-08	$190M	$186M – $190M	Rekt News; Nomad post-mortem; Chainalysis. Crowd-sourced exploit.	SOURCED
Mango Markets Exploit	2022-10	$114M	$110M – $117M	DOJ indictment of Eisenberg (Dec 2022); SEC complaint. ~$110M per DOJ.	SOURCED
FTX Collapse	2022-11	$8.7B	$8B – $9.7B	DOJ sentencing (March 2024); FTX bankruptcy; John Ray III testimony. ~$8B+ shortfall.	SOURCED
Euler Finance Exploit	2023-03	$197M	$195M – $197M	Rekt News; Euler post-mortem; The Block. ~$197M exploited. All returned.	SOURCED
SVB Failure & USDC De-Peg	2023-03	undetermined	—	Fed review (April 2023); Circle $3.3B at SVB; USDC depegged to $0.87.	SOURCED
Curve Finance Pool Exploit	2023-07	$62M	$52M – $73M	Rekt News; Curve; Vyper vulnerability. Multiple pools totaling $62–73M gross.	SOURCED
Multichain Bridge Collapse	2023-07	$126M	$125M – $130M	Rekt News; Multichain; Chainalysis. ~$126M drained after CEO detained.	SOURCED
DMM Bitcoin Exchange Hack	2024-05	$305M	$300M – $308M	Reuters (May 2024); JFSA; DMM Bitcoin. 4,502.9 BTC. FBI: Lazarus Group.	SOURCED
WazirX Multi-Sig Exploit	2024-07	$235M	$230M – $235M	Reuters (July 2024); WazirX; Liminal; Elliptic.	SOURCED
Bybit 1.5B Hack	2025-02	$1.5B	$1.4B – $1.5B	Reuters (Feb 2025); Bybit; FBI; Chainalysis/Elliptic. ~401,347 ETH via Safe Wallet attack.	SOURCED
Hyperliquid Whale Manipulation	2025-03	$15M	$12M – $17M	The Block; CoinDesk (March 2025). ~$4M from ETH whale; ~$10.6M from JELLY.	PLAUSIBLE

Notes on "Undetermined" Events

Four events are correctly stored as "undetermined" losses:

Event	Why Undetermined Is Correct
BTC-e Seizure	DOJ focused on money laundering; reserves never published; user losses never quantified
Black Thursday	Market-wide crash; no single-entity loss attribution possible
SushiSwap Vampire Attack	Competitive migration, not theft; no permanent losses
SVB / USDC De-Peg	TradFi crisis with transient crypto impact; losses not aggregated

5. Search Relevance (Level 4)

FINDING: Highly-cited papers from completely unrelated fields appear in multiple channels. Biomedical, environmental, and general science papers with 5,000–12,000 citations inflate citation impact scores.

An automated keyword heuristic was applied to all 2,433 papers: each paper's title was checked for terms related to its assigned channel (e.g., "bridge," "cross-chain," "interoperability" for bridge_vulnerability). This heuristic is deliberately coarse — a paper can be relevant despite not containing a keyword in its title — but it flags the most obviously misplaced papers.

Per-Channel Relevance Rates

Channel	Papers	Keyword-Relevant	Rate	Visual
stablecoin_runs	218	52	23.9%	23.9%
network_contagion	200	43	21.5%	21.5%
composability_risk	200	36	18.0%	18.0%
regulatory_contagion	200	35	17.5%	17.5%
governance_failure	200	33	16.5%	16.5%
oracle_manipulation	208	22	10.6%	10.6%
liquidity_spirals	263	21	8.0%	8.0%
information_asymmetry	200	15	7.5%	7.5%
liquidation_cascades	200	14	7.0%	7.0%
rwa_transmission	200	9	4.5%	4.5%
validator_concentration	200	8	4.0%	4.0%
gateway_risk	348	7	2.0%	2.0%
counterparty_concentration	200	3	1.5%	1.5%
bridge_vulnerability	200	0	0.0%	0.0%

Caveat: Low keyword-match rates do not necessarily mean all those papers are irrelevant. A paper titled "A Survey of Transfer Learning" will fail a keyword check for "bridge" but could conceivably relate to cross-chain technology if OpenAlex classified it under relevant concepts. However, when the top-10 most-cited papers in a channel are all from biomolecular simulation, nanotechnology, and environmental science, the keyword heuristic is clearly flagging a real problem.

Critical Finding: Top Irrelevant Papers by Citation Count

These papers have the highest citation counts among those flagged as irrelevant to their assigned channels:

#	Title	Citations	Channel(s)
1	A comparative risk assessment of burden of disease and injury attributable to 67 risk factors...	11,936	validator_concentration
2	CHARMM: The biomolecular simulation program	8,937	bridge_vulnerability
3	Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases...	7,311	composability_risk
4	Integrative analysis of 111 reference human epigenomes	7,015	validator_concentration
5	Heavy Metal Toxicity and the Environment	6,837	validator_concentration
6	Nano based drug delivery systems: recent developments and future prospects	6,328	bridge_vulnerability
7	ImageJ2: ImageJ for the next generation of scientific image data	6,112	bridge_vulnerability
8	A survey of transfer learning	5,982	bridge_vulnerability
9	Biological properties of extracellular vesicles and their physiological functions	5,761	bridge_vulnerability
10	Cancer nanomedicine: progress, challenges and opportunities	5,475	validator_concentration

Impact: These irrelevant papers inflate the mean-top-10 citation scores used in the citation_impact dimension. For example, bridge_vulnerability is ranked #1 for citation_impact with a mean_top10 of 5,228.8 — but every single paper in its top 10 is from an unrelated field (biomolecular simulation, nanotechnology, image processing, transfer learning).

6. Channel Assignment Quality (Level 5)

FINDING: Irrelevant papers concentrate in channels with broad search terms, significantly affecting citation_impact scores.

The table below shows, for each channel, how many of its top-10 most-cited papers are flagged as irrelevant, and what the mean citation count of that top-10 is. Channels where 10/10 top papers are irrelevant have citation_impact scores entirely driven by noise.

Top-10 Contamination by Channel

Channel	Irrelevant in Top-10	Mean Top-10 Citations	Severity
bridge_vulnerability	10 / 10	5,228.8	CRITICAL
validator_concentration	10 / 10	5,041.1	CRITICAL
gateway_risk	10 / 10	1,699.5	CRITICAL
counterparty_concentration	10 / 10	1,105.1	CRITICAL
governance_failure	10 / 10	1,213.3	CRITICAL
liquidation_cascades	10 / 10	1,151.3	CRITICAL
oracle_manipulation	10 / 10	686.9	CRITICAL
rwa_transmission	10 / 10	1,148.4	CRITICAL
composability_risk	9 / 10	3,024.5	HIGH
information_asymmetry	9 / 10	927.2	HIGH
liquidity_spirals	9 / 10	1,603.6	HIGH
regulatory_contagion	9 / 10	937.3	HIGH
network_contagion	8 / 10	854.8	MODERATE
stablecoin_runs	8 / 10	161.4	MODERATE

Detailed: bridge_vulnerability top-10 (all irrelevant)

#	Title	Citations	Field
1	CHARMM: The biomolecular simulation program	8,937	Computational biology
2	Nano based drug delivery systems	6,328	Pharmaceutical science
3	ImageJ2: ImageJ for the next generation of scientific image data	6,112	Image processing
4	A survey of transfer learning	5,982	Machine learning
5	Biological properties of extracellular vesicles	5,761	Cell biology
6	Natural products in drug discovery	4,666	Pharmacology
7	Artificial Intelligence (AI): Multidisciplinary perspectives	3,760	General AI
8	The role of hydrogen and fuel cells in the global energy system	3,620	Energy systems
9	Present and Future of Surface-Enhanced Raman Scattering	3,596	Chemistry/spectroscopy
10	Internet of things: Vision, applications and research challenges	3,526	IoT/computing

Not a single paper in bridge_vulnerability's top-10 relates to blockchain bridges, cross-chain protocols, or interoperability. The mean_top10 of 5,228.8 is entirely noise.

Detailed: validator_concentration top-10 (all irrelevant)

#	Title	Citations	Field
1	A comparative risk assessment of burden of disease (Lancet)	11,936	Public health
2	Integrative analysis of 111 reference human epigenomes	7,015	Genomics
3	Heavy Metal Toxicity and the Environment	6,837	Environmental toxicology
4	Cancer nanomedicine: progress, challenges and opportunities	5,475	Oncology
5	Interaction between microbiota and immunity	3,652	Immunology
6	Deciphering the Liquidity and Credit Crunch 2007-2008	3,361	Finance (but not validator/PoS)
7	Parkinson's disease	3,327	Neurology
8	World agriculture towards 2030/2050	3,142	Agricultural economics
9	Role of the normal gut microbiota	2,928	Gastroenterology
10	Safeguarding human health in the Anthropocene epoch	2,738	Environmental health

The Lancet disease burden study alone (11,936 citations) inflates this channel's mean_top10 dramatically. No paper in the top 10 relates to proof-of-stake validators, staking concentration, or consensus mechanisms.

Root Cause

The OpenAlex search queries use broad terms (e.g., "bridge" matches biomedical bridge studies, "concentration" matches environmental toxicology, "validation" matches scientific validation methodologies). The pipeline then sorts by citation count, causing highly-cited papers from large fields (medicine, biology, chemistry) to dominate over niche DeFi/blockchain literature.

7. Overall Verdict

Level	Check	Verdict	Evidence
1	Paper Existence	GREEN	All 2,433 papers found in OpenAlex. Zero fabricated.
2	Citation Accuracy	GREEN	current_citations ≥ stored_citations for every paper. Mean growth +2.0.
3	Crisis Losses	GREEN	21 sourced (court/regulator/blockchain), 2 plausible, 0 unsourced. 4 undetermined correct.
4	Search Relevance	RED	Biomedical papers (Lancet, CHARMM, epigenomics) with 5,000-12,000 citations in financial channels.
5	Channel Assignment	YELLOW	Keyword heuristic is coarse but confirms: 8-10 of top-10 papers are irrelevant in most channels.

Bottom Line

The data is REAL but contains NOISE.

The pipeline correctly retrieves and scores papers from OpenAlex — no fabrication, no inflation, no phantom records. Crisis losses are traceable to court filings, regulatory reports, and blockchain analytics.

However, the broad search queries combined with citation-count sorting pull in highly-cited papers from entirely unrelated fields (biomedicine, chemistry, environmental science), inflating the citation_impact scores for channels whose query terms overlap with those disciplines.

8. Implications for the Paper

What is arithmetically correct

The composite scores are arithmetically correct given the data. The scoring formulas (min-max normalization, equal weighting across dimensions) were verified in the methodology report. The code does exactly what it claims.

What the data contains

The data itself contains relevance noise. Papers were retrieved from OpenAlex using search queries, and the API returned results sorted by citation count. For channels with broad query terms, the top results come from large, heavily-cited fields that share vocabulary but not subject matter.

Which dimensions are affected

Scoring Dimension	Impact	Explanation
citation_impact (mean top-10 citations)	HIGH	Irrelevant papers with 5,000–12,000 citations dominate the top-10 for multiple channels. bridge_vulnerability's mean_top10 of 5,228.8 is entirely from non-financial papers. validator_concentration's 5,041.1 includes a Lancet study and epigenome research.
literature_volume (paper count)	MODERATE	Irrelevant papers are counted in the total but do not dominate volume the way they dominate citation counts. A channel with 200 papers where 160 are irrelevant still has "200 papers" — but the volume number is less distorted because every channel has a similar base count.
crisis_evidence	NONE	Crisis events were manually curated and independently verified against public sources. This dimension is not affected by search relevance noise.
composite_score	MODERATE	Since citation_impact is one of three equally-weighted dimensions, channels whose citation_impact is inflated by irrelevant papers will have moderately inflated composite scores. The effect is diluted by the other two dimensions.

Channels Most Affected

Likely Over-Ranked (citation_impact inflated)

bridge_vulnerability — mean_top10 = 5,228.8, all 10 papers irrelevant
validator_concentration — mean_top10 = 5,041.1, all 10 papers irrelevant
composability_risk — mean_top10 = 3,024.5, 9/10 irrelevant

Relatively More Reliable

stablecoin_runs — lowest mean_top10 (161.4), 2 relevant in top-10
network_contagion — 2 relevant in top-10, moderate citations
All channels for crisis_evidence dimension (fully verified)

Recommendation

For the paper: The composite scores can be presented as a systematic, reproducible ranking — but with a clear caveat that the citation_impact dimension is affected by search noise from broad OpenAlex queries. The pipeline architecture is sound (real data, correct math, verified crisis events), but the search relevance filtering step is the weak link. Future iterations could apply abstract-level NLP filtering or restrict to finance/CS-classified works to reduce noise.

Generated: 2026-03-27 | Source data: ground_truth_all_papers.json (2,433 papers, verified in 1,098s) & ground_truth_crisis_events.json (25 events) | Verification: OpenAlex live API + public regulatory/judicial sources.