Methodology and Reproducibility for Participant-Level Order Flow Forecasting

A Pre-Registered Pipeline

Joerg Osterrieder|2026-05-09

Abstract

We present a pre-registered, reproducibility-first methodology stack for daily-frequency forecasting research with participant-decomposed Xetra order flow data. The pipeline encodes a strict synthetic-first development contract, a walk-forward purge-and-embargo cross-validation design, a pre-committed estimator universe (Majority, Persistence, Momentum, Logistic, RandomForest with RandomizedSearchCV(n_iter=50)), paired hypothesis tests (DeLong for ROC-AUC, Diebold-Mariano with HAC standard errors and the Harvey-Leybourne-Newbold small-sample correction for predictive-accuracy differentials), Benjamini-Hochberg false-discovery-rate control over a closed 36-name hypothesis family, a serial-correlation-aware block bootstrap with locked parameters, and a fixed five-basis-point half-spread cost model with a Corwin-Schultz sensitivity. We define a three-tier claim-to-evidence taxonomy that ties every numerical claim in the paper to either a deterministic-block manifest hash (Tier alpha), a bootstrap computation seeded against that hash (Tier alpha-derived), or a Docker-only figure rendering (Tier beta, prohibited from in-line text). On synthetic data the deterministic block reproduces bit-identically across the reference run with manifest SHA 7be9eca...b6fe6d. An imposed-signal power demonstration confirms the methodology recovers daily information ratios in the literature-reported range. The contribution targets the methods gap that Harvey (2017) and Welch (2019) identify in empirical finance: when reproducibility primitives are missing, null findings are unpublishable and positive findings are unverifiable.

Downloads

paper.pdf

Full manuscript, Phase 1i refresh. Tier A figure renders, light Section 6 tone audit.

slides.pdf

Beamer presentation, twenty slides covering motivation, methodology, and replication.

Pre-registration anchor

Tagpre-reg-v1
Commit0f7741f31dea3c2555ac4632e66d3ea7072c436f
Manifest SHA (Tier A)7be9ecaadf707a0bd948b5b83574c89d729ceca4893f53a84f9d6b413ca6fe6d
Repositorygithub.com/Digital-AI-Finance/daily-order-flow-based-excess-alpha
SSRNv2 staged for upload (link will appear here once posted)

Replication

The full source repository, the synthetic reference run, and the manifest verification script reproduce every Tier A number in the paper. One verification command:

git clone https://github.com/Digital-AI-Finance/daily-order-flow-based-excess-alpha
cd daily-order-flow-based-excess-alpha
git checkout pre-reg-v1
uv sync --frozen --extra dev
python scripts/check_signoffs.py && uv run pytest -q

Exit zero means the methodology is intact.

Companion paper

This paper (Y) is the methodology and reproducibility primitive. The companion empirical paper (X) applies this stack to the complete Xetra participant-category daily order-flow panel and is forthcoming. No methodology choice in the empirical paper differs from the choices locked at pre-reg-v1. See docs/adr/ADR-021.md in the repository for the path-to-paper plan.