MARKET EFFICIENCY AND FIRST-MOVER ADVANTAGE AFTER U.S. MACROECONOMIC RELEASES AT EUREX AND CME
GIT_Distribution_difference_CME.ipynb - Statistical Comparison on CMEGIT_Distribution_difference_Eurex.ipynb - Statistical Comparison on EUREXGIT_pickle_analysis.ipynb - Raw Event Memory & Microsecond Latency VisualizationGIT_Final_plot_creator.ipynb - Unified Visual Aggregations & Cross-Exchange ComparisonsGIT_Price_change_analysis_EUREX_volume.ipynb - Eurex Latency Notional AnalyticsGIT_PriceFormation_PnL_Analytics.ipynb - PnL Correlations & Predictive ModelingThe directory contains six deeply interconnected Jupyter Notebooks dedicated to analyzing ultra-low-latency High-Frequency Trading (HFT) reactions triggered by broad macroeconomic announcements (like NFP, FOMC and ISM updates).
Covering both the Chicago Mercantile Exchange (CME) and Eurex, this suite of notebooks processes cached .pickle memory files and raw .csv trade histories to conduct comprehensive statistical regime testing, generate aggregated visual analyses, study cumulative Profit and Loss (P&L) dynamics, track notional traded volume at nanosecond precision, and perform predictive modeling using Random Forest architectures.
This extensive documentation provides a line-by-line understanding, mathematical breakdown, and architectural visualization of each notebook, helping researchers, quantitative analysts, and quantitative developers understand how the data structures are grouped, how the nonparametric algorithms are applied, and what metrics the graphical visuals aim to highlight.
The repository consists of analytical Jupyter notebooks (.ipynb), processed tabular data files (.csv), custom caching structures (.pkl), and a dedicated Event_pickle_files subdirectory containing raw, granular event-driven memory dumps.
GIT_Distribution_difference_CME.ipynb: Analyzes the custom Trade dataclass for the Chicago Mercantile Exchange (CME). Focuses on performing rigorous non-parametric statistical tests (e.g., KS Test, Mann-Whitney U, Bootstrap Mean Comparison) to compare distributions of trades and market behaviors during structural macro-events versus quiet periods.GIT_Distribution_difference_Eurex.ipynb: Mirrors the CME analysis but is explicitly tailored for Eurex market structures. This script analyzes European trade distributions, iterating over fractional latency intervals to apply identical statistical tests for baseline comparison.GIT_PriceFormation_PnL_Analytics.ipynb: Investigates basis point price changes and Profit & Loss (P&L) dynamics based on cleaned, processed .csv data. Calculates cross-correlations (Pearson, Spearman, Kendall) and applies Machine Learning algorithms (Random Forest) alongside Linear Regression frameworks to predict sequential price formation.GIT_Price_change_analysis_EUREX_volume.ipynb: Examines the relationship between price changes and traded volume exclusively on the Eurex exchange, constructing visual step-evolution models tracking Cumulative Notional against extreme latency benchmarks.GIT_pickle_analysis.ipynb: Focuses on analyzing Eurex and CME market reactions to specific global economic events mapping multi-axis graphical scatters using the pre-aggregated .pickle transaction data directly from memory bounds.GIT_Final_plot_creator.ipynb: A centralized, consolidated notebook responsible for generating the final publication-ready visualizations and multi-panel plots aggregating raw performance arrays for both Eurex and CME simultaneously.CME_processed_individual_data.csv: Cleaned, tabular, processed trade-level execution data compiling normalized intervals for CME derivatives. Used heavily in predictive modeling. Contains the price change, notional, 10s markouts and the price sensitivity in intervals of $0-200 \mu s$, $200 \mu s-30 ms$, $30 ms-100 ms$ and $100 ms-10 s$ for each individual asset after every ISM release at CME. The assets are ES, NQ, YM, RTY, TN, ZB, ZT, ZF, UB, ZN.EUREX_processed_individual_data.csv: The European equivalent containing normalized intervals matching exact Eurex latency constraints. Contains the price change, notional, 10s markouts and the price sensitivity in intervals of $0-200 \mu s$, $200 \mu s-30 ms$, $30 ms-100 ms$ and $100 ms-10 s$ for each individual asset after every ISM release at EUREX. The assets include all the futures contracts traded at EUREX.EUREX_ISM_reactions_volume_aggregated.pkl: Serialized Eurex grouped volume data separating high/low liquidity classes, showcasing cumulative market reactions and Notional summations strictly aligned to ISM (Institute for Supply Management) economic reports. Containing the dictionary of data of individual high-frequency trades and their properties (price impact, notional and markouts) for assets in different liquidity classes (top 15 vs rest) on the Eurex exchange, on periods after macroeconomic data releases. This dictionary-like serialization retains the pre-calculated states preventing the script from repetitively burning RAM doing group-by calculations on billions of limit book actions.Event_pickle_files/ DirectoryThis critical subdirectory acts as the raw persistence layer housing various .pickle files which serialize massive Python objects (EventReactions mappings) representing ultra-high-frequency trades around macro-economic events matching nanosecond temporal limits.
economic_event_reactions_100ms.pickle: The primary baseline dataset. Captures absolute market reactions logged dynamically within a bounded 100-millisecond interval surrounding target events.economic_event_reactions_next_1.pickle, economic_event_reactions_prev_1.pickle, ..._7.pickle, and fractional spans like ..._0.000694...pickle capture precise shifting market conditions within strict time windows immediately before (prev) and after (next) economic events evaluating day offsets or exact minute offsets.ism_reactions_CME.pickle / ism_reactions_CME_old.pickle: Focused serialized object records specific to CME trade reactions heavily centered around precise ISM manufacturing/services news releases and trailing US economic signals.Overleaf_files/ DirectoryThis subdirectory contains the LaTeX source files and associated assets for generating the academic paper summarizing the research findings.
CME_EUREX_PMI_reactions_202511.tex: The LaTeX manuscript titled “Racing the News: Market Efficiency and First-Mover Advantage After U.S. Macroeconomic Releases at Eurex and CME”. It comprehensively synthesizes the analytical outputs, hypothesis tests, and theoretical frameworks modeled across the Jupyter notebooks.references.bib: The BibTeX bibliography file managing all the academic citations referenced throughout the manuscript.arxiv.sty: The LaTeX style package used to format the paper for submission to academic preprint repositories like arXiv.Plots/: A subdirectory storing the high-resolution, publication-ready charts and figures (generated by notebooks like GIT_Final_plot_creator.ipynb) which are subsequently embedded into the LaTeX document.Before detailing individual notebooks, it is fundamentally crucial to understand how trades are mapped in memory. Across multiple notebooks, the code repeatedly defines high-performance @dataclass objects to handle the gigantic volume of tick-level order book records imported from pickle artifacts.
Trade DataclassThe standard Eurex Trade incorporates highly specific European market structures:
t3a: Represents the raw timestamp of the action down to the nanosecond scale.priority: Order priority logic for execution queues.market_segment_id & security_id: Identifier maps for asset tracking.last_px & last_qty: The final execution price and size.markout_price & average_price: Critical elements used to calculate theoretical Profit and Loss.CurrencyFactor(): Normalizes P&L natively across global currency indices directly inside the object (e.g. converting KRW, USD, GBP uniquely relative to EUR).Trade DataclassThe standard Chicago Mercantile Exchange (CME) equivalent incorporates specific mappings for US structures:
ExecID: Serves an identical chronological role as Eurex’s ‘t3a’ timing.AvgPrice & MarkoutPrice: Essential mapping bounds for calculating execution slips.Side: Specifies buying or selling aggression natively (AGGRESSOR_SIDE_BUY / AGGRESSOR_SIDE_SELL).MarkoutPnl() method for CME forcefully monitors for specific Treasury bounds ('ZN', 'ZT', 'ZF', 'ZB', 'UB', 'TN') multiplying the basis by $10^{-2}$ since CME fixed-income derivatives are quoted rigorously in points rather than explicit price factors.Both environments wrap their trades in an encompassing EventReactions dataclass, holding exactly the timestamp of the macroeconomic news, standard label tags (like “NFP” or “FOMC”), and arrays holding the preactions (before event) and reactions (after event).
GIT_Distribution_difference_CME.ipynb - Statistical Comparison on CMEThis notebook is explicitly tasked with statistically analyzing whether the distribution of trades executed during the immediate $100ms$ chaos trailing a macro-release on CME differs fundamentally from the quiet periods preceding it, or differs significantly across asset classifications (Equity vs Fixed Income).
scipy.stats module to harness strict, heavy-tail resistant calculations:
stats.ks_2samp): Checks if two independent empirical samples are governed by the exact same continuous distribution. It serves as the primary gauge for systemic regime changes (stochastic disruptions).stats.mannwhitneyu): Tests whether the population spaces of trade values are identical or definitively shifted.stats.wilcoxon): Measures if the calculated Markout profits post-event are statistically greater than $0$ (signifying profitable execution against toxic order flow).Computational Validations Simulator:
It implements rigorous randomized Bootstrapping (bootstrap_mean_comparison) traversing $5000$ loop variants mixing the post-event and pre-event distributions randomly to natively test if the mean of the arrays surpasses standard deviations naturally, as well as a pure Permutation T-Test doing identical loops for absolute independence verification.
CME_processed_individual_data.csv) natively and runs the aforementioned statistical gauntlet spanning cross-comparisons against:
All test statistics and generated p-values are exported continuously into an aggressively appended table named Distributional_difference.csv.
GIT_Distribution_difference_Eurex.ipynb - Statistical Comparison on EUREXFunctionally identical in mathematical premise to the CME variant, this notebook redirects the statistical frameworks strictly towards European data utilizing .pickle caches mapping the fractional reaction periods.
economic_event_reactions_100ms.pickle (The baseline window).economic_event_reactions_next_1.pickle vs economic_event_reactions_prev_1.pickle (Analyzing exactly $\pm 1$ Day)..pickle objects ending in increments matching $0.0006944444…$ representing explicit normalized $\pm 1$ Minute fractionals in Pandas daytime architectures.bulk_compute):
The script constructs heavy array lists parsing nested events targeting purely independent global US macros like ISM and NFP. It accumulates individual trade PnLs (MarkoutPnl_10s) and Notional values across thousands of reactions and pipes them directly through the conduct_all_tests() engine defined identically as in the CME script.GIT_pickle_analysis.ipynb - Raw Event Memory & Microsecond Latency VisualizationConsidered the core exploratory visualizer for deep raw memory, this notebook pulls apart the raw bytes embedded inside the 100ms.pickle dictionaries and generates massive, highly technical scatter plot layers mapping how many nanoseconds it took for the matching engines to record trades following external news publications.
Line2D and PathEffects parameters bounding color schemas specific to exact macroeconomic announcements:
ISM MANUFACTURING = BlueISM SERVICES = Light BlueNFP (Non-Farm Payroll) = GreenFOMC = Orangeplot_cme):
semilogy strict mathematical boundary, shifting bounds rapidly from 0 \mu s down through bounds measuring single-digit nanoseconds.alpha=0.25 shaded visual bounds physically demarcate explicit latency speed upgrades implemented historically on the CME engines (signaled by changes on dates surrounding mid-2021 and 2024).plot_Eurex):
bisect_left natively to sort temporal bounds across the X-axis tracking distinct years (2020 through 2025)..pickle file framework.GIT_Final_plot_creator.ipynb - Unified Visual Aggregations & Cross-Exchange ComparisonsThis notebook acts as the centralized report generator, combining trades from both Eurex and CME simultaneously and processing them through synchronized binning functions to build comprehensive publication-ready multi-axis plots.
bin_trades_CME / bin_trades_EUREX):
Order execution counts and financial calculations are forced into tight localized segments:
Uniform Microsecond Bins: Trades are grouped explicitly into bounds tracking exactly $10 \mu s$ ($10,000$ nanosecond spans running to $100$ milliseconds).Logarithmic Time Bins: A massive exponential funnel [0, 1000, 10000, 100000, 1000000, 10000000, 100000000] mapping raw processing delay curves.count_reactions):
reactions plotted against Latency intervals (0 to 100 ms) using deep gray plots punctuated with black markers.plot_cumulative_PnL):
Notional velocity and cumulative $10s$ Markout PnL extracted uniformly via loops counting sequential sums np.cumsum(array).(K) modifiers for PnL ($10^3$) and (M) modifiers for Notional amounts ($10^6$).ax.fill_between graphing functions interpolating the 5th and 95th Percentile bounds surrounding the Mean performance tracking exactly how toxic initial positions react compared to delayed trailing orders for both exchanges.GIT_Price_change_analysis_EUREX_volume.ipynb - Eurex Latency Notional AnalyticsFocusing strictly on orderbook saturation and bandwidth constraint visualizations, this script dissects cumulative European volume behavior across extreme ultra-low latency checkpoints utilizing a specially optimized cache (EUREX_ISM_reactions_volume_aggregated.pkl).
Volume Caching Ecosystem:
Grouping multi-gigabyte files linearly takes immense time. The script operates off a serialized cache that pre-grouped high-liquidity assets versus low-liquidity assets mapping PriceChangeList, NotionalList, and MarkoutList. If the cache is absent, the script smartly defaults to an empty initialization structure dynamically awaiting hydration.
Dashed Latency Demarcations (plot_Notional_PnL_evolution):
The unique value of this notebook is mapping the Normalized Cumulative Notional trajectory utilizing a stepwise line approach (where='post').
symlog (Symmetrical Log) configuration allowing negative millisecond visualization seamlessly traversing zero into positive post-event bounds flawlessly keeping the $X$-axis linearly legible across the zero bound.GIT_PriceFormation_PnL_Analytics.ipynb - PnL Correlations & Predictive ModelingA machine-learning and rigorous statistical analysis framework operating entirely on pre-compiled tabular outputs. This notebook seeks to quantify relationships sequentially mapping price impacts from atomic microsecond shifts straight into macro effects and forecasting subsequent momentum shifts dynamically.
abs() >= 0.25), this script uses core Pandas corr() functions to output tables assessing:
pearson (linear), spearman (monotonic ranks), and kendall (tau ranking indices) matrices.kruskal implementations and heavy variance testing via friedmanchisquare imports detecting shifts mathematically.Skew Normal Probabilities (ss.skewnorm):
Transforms standard frequency histograms mapped out via Seaborn (sns.histplot) utilizing multi-hue shaded KDE plots into fitted theoretical Skew Normal mappings tracking Probability Density frequencies (PDF). It graphs these overlays explicitly to verify non-normal distributions inherent in toxic HFT event trading.
sklearn attempting to guess the resulting basis point outcome for bounds $200 \mu s - 100 ms$ purely based upon the absolute initial shock happening inside $0 - 200 \mu s$.
train_test_split mapping datasets 60-20-20 (Training, Validation, Testing boundaries).RandomForestRegressor): Natively initializes an ensemble configuration processing 20 massive categorical trees (n_estimators=20) executing strictly utilizing true Out-Of-Bag (oob_score=True) error tracking checking continuous validity against overfitting structures.linreg_summary() function dissecting deep standard errors, R², F-statistics, and strictly checking $p(B_1)$ bounds identifying algorithmic reliability against noise.To recreate and analyze the outputs generated globally across this entire Jupyter ecosystem seamlessly, identical architectures must exist locally:
3.7+ given the absolute necessity of rigorous typing arguments, native dataclass inheritance (@dataclass(order=True)), and deep internal mathematical structures.Core packages must be configured locally:
pandas / numpy: Matrix and mathematical bounding frameworks.scipy: Fundamentally powers all underlying statistical KS-tests, MWU-tests, Wilcoxon equations natively alongside standard skew correlations.matplotlib / seaborn: The primary engine controlling multi grid visuals, log transformations, and path effects manipulating hex colors natively.scikit-learn: Exclusively required for executing Random Forest decision grids within Notebook 6 safely processing split arrays and categorical metrics properly.Due to hardcoded paths across all $6$ files, a core directory structure referencing an explicit folder named /Event_pickle_files directly inside the primary domain must exist hosting all iterations of raw memory arrays (e.g. ism_reactions_CME.pickle, economic_event_reactions_100ms.pickle).
Furthermore, strictly structured outputs like CME_processed_individual_data.csv and EUREX_ISM_reactions_volume_aggregated.pkl must comfortably execute adjacent to the .ipynb documents explicitly controlling paths linearly ensuring immediate loading efficiency natively maintaining functional dependency trees safely avoiding pathing exceptions completely.
Together, these notebooks offer an incredibly detailed framework dedicated strictly toward dissecting the intricate architecture mapping physical latency speed and topological advantages strictly towards financial realization parameters mapping event reaction times against order book saturation linearly.
Analysts aiming to adopt this structure specifically should:
.pickle validations inside notebooks (3) and (5) targeting high-level visualizations checking system normalization and hardware delays explicitly targeting latency benchmarks (907 nanoseconds vs 37ms curves).