A Pre-Registered Pipeline
We present a pre-registered, reproducibility-first methodology stack for daily-frequency forecasting research with participant-decomposed Xetra order flow data. The pipeline encodes a strict synthetic-first development contract, a walk-forward purge-and-embargo cross-validation design, a pre-committed estimator universe (Majority, Persistence, Momentum, Logistic, RandomForest with RandomizedSearchCV(n_iter=50)), paired hypothesis tests (DeLong for ROC-AUC, Diebold-Mariano with HAC standard errors and the Harvey-Leybourne-Newbold small-sample correction for predictive-accuracy differentials), Benjamini-Hochberg false-discovery-rate control over a closed 36-name hypothesis family, a serial-correlation-aware block bootstrap with locked parameters, and a fixed five-basis-point half-spread cost model with a Corwin-Schultz sensitivity. We define a three-tier claim-to-evidence taxonomy that ties every numerical claim in the paper to either a deterministic-block manifest hash (Tier alpha), a bootstrap computation seeded against that hash (Tier alpha-derived), or a Docker-only figure rendering (Tier beta, prohibited from in-line text). On synthetic data the deterministic block reproduces bit-identically across the reference run with manifest SHA 7be9eca...b6fe6d. An imposed-signal power demonstration confirms the methodology recovers daily information ratios in the literature-reported range. The contribution targets the methods gap that Harvey (2017) and Welch (2019) identify in empirical finance: when reproducibility primitives are missing, null findings are unpublishable and positive findings are unverifiable.
Full manuscript, Phase 1i refresh. Tier A figure renders, light Section 6 tone audit.
Beamer presentation, twenty slides covering motivation, methodology, and replication.
| Tag | pre-reg-v1 |
|---|---|
| Commit | 0f7741f31dea3c2555ac4632e66d3ea7072c436f |
| Manifest SHA (Tier A) | 7be9ecaadf707a0bd948b5b83574c89d729ceca4893f53a84f9d6b413ca6fe6d |
| Repository | github.com/Digital-AI-Finance/daily-order-flow-based-excess-alpha |
| SSRN | v2 staged for upload (link will appear here once posted) |
The full source repository, the synthetic reference run, and the manifest verification script reproduce every Tier A number in the paper. One verification command:
git clone https://github.com/Digital-AI-Finance/daily-order-flow-based-excess-alpha
cd daily-order-flow-based-excess-alpha
git checkout pre-reg-v1
uv sync --frozen --extra dev
python scripts/check_signoffs.py && uv run pytest -q
Exit zero means the methodology is intact.
This paper (Y) is the methodology and reproducibility primitive. The companion empirical paper (X) applies this stack to the complete Xetra participant-category daily order-flow panel and is forthcoming. No methodology choice in the empirical paper differs from the choices locked at pre-reg-v1. See docs/adr/ADR-021.md in the repository for the path-to-paper plan.