automated-trading-systems

US_MacroRelease_HPT_Analysis

MARKET EFFICIENCY AND FIRST-MOVER ADVANTAGE AFTER U.S. MACROECONOMIC RELEASES AT EUREX AND CME

Executive Summary
Directory Overview
Global Data Structures & Architectural Definitions
1: GIT_Distribution_difference_CME.ipynb - Statistical Comparison on CME
2: GIT_Distribution_difference_Eurex.ipynb - Statistical Comparison on EUREX
3: GIT_pickle_analysis.ipynb - Raw Event Memory & Microsecond Latency Visualization
4: GIT_Final_plot_creator.ipynb - Unified Visual Aggregations & Cross-Exchange Comparisons
5: GIT_Price_change_analysis_EUREX_volume.ipynb - Eurex Latency Notional Analytics
6: GIT_PriceFormation_PnL_Analytics.ipynb - PnL Correlations & Predictive Modeling
Prerequisites and Environment Setup
Conclusion and Execution Workflows

Executive Summary

The directory contains six deeply interconnected Jupyter Notebooks dedicated to analyzing ultra-low-latency High-Frequency Trading (HFT) reactions triggered by broad macroeconomic announcements (like NFP, FOMC and ISM updates).

Covering both the Chicago Mercantile Exchange (CME) and Eurex, this suite of notebooks processes cached .pickle memory files and raw .csv trade histories to conduct comprehensive statistical regime testing, generate aggregated visual analyses, study cumulative Profit and Loss (P&L) dynamics, track notional traded volume at nanosecond precision, and perform predictive modeling using Random Forest architectures.

This extensive documentation provides a line-by-line understanding, mathematical breakdown, and architectural visualization of each notebook, helping researchers, quantitative analysts, and quantitative developers understand how the data structures are grouped, how the nonparametric algorithms are applied, and what metrics the graphical visuals aim to highlight.

Directory Overview

The repository consists of analytical Jupyter notebooks (.ipynb), processed tabular data files (.csv), custom caching structures (.pkl), and a dedicated Event_pickle_files subdirectory containing raw, granular event-driven memory dumps.

📓 Jupyter Notebooks

GIT_Distribution_difference_CME.ipynb: Analyzes the custom Trade dataclass for the Chicago Mercantile Exchange (CME). Focuses on performing rigorous non-parametric statistical tests (e.g., KS Test, Mann-Whitney U, Bootstrap Mean Comparison) to compare distributions of trades and market behaviors during structural macro-events versus quiet periods.
GIT_Distribution_difference_Eurex.ipynb: Mirrors the CME analysis but is explicitly tailored for Eurex market structures. This script analyzes European trade distributions, iterating over fractional latency intervals to apply identical statistical tests for baseline comparison.
GIT_PriceFormation_PnL_Analytics.ipynb: Investigates basis point price changes and Profit & Loss (P&L) dynamics based on cleaned, processed .csv data. Calculates cross-correlations (Pearson, Spearman, Kendall) and applies Machine Learning algorithms (Random Forest) alongside Linear Regression frameworks to predict sequential price formation.
GIT_Price_change_analysis_EUREX_volume.ipynb: Examines the relationship between price changes and traded volume exclusively on the Eurex exchange, constructing visual step-evolution models tracking Cumulative Notional against extreme latency benchmarks.
GIT_pickle_analysis.ipynb: Focuses on analyzing Eurex and CME market reactions to specific global economic events mapping multi-axis graphical scatters using the pre-aggregated .pickle transaction data directly from memory bounds.
GIT_Final_plot_creator.ipynb: A centralized, consolidated notebook responsible for generating the final publication-ready visualizations and multi-panel plots aggregating raw performance arrays for both Eurex and CME simultaneously.

📊 Data Files (Root Directory)

CME_processed_individual_data.csv: Cleaned, tabular, processed trade-level execution data compiling normalized intervals for CME derivatives. Used heavily in predictive modeling. Contains the price change, notional, 10s markouts and the price sensitivity in intervals of $0-200 \mu s$, $200 \mu s-30 ms$, $30 ms-100 ms$ and $100 ms-10 s$ for each individual asset after every ISM release at CME. The assets are ES, NQ, YM, RTY, TN, ZB, ZT, ZF, UB, ZN.
EUREX_processed_individual_data.csv: The European equivalent containing normalized intervals matching exact Eurex latency constraints. Contains the price change, notional, 10s markouts and the price sensitivity in intervals of $0-200 \mu s$, $200 \mu s-30 ms$, $30 ms-100 ms$ and $100 ms-10 s$ for each individual asset after every ISM release at EUREX. The assets include all the futures contracts traded at EUREX.
EUREX_ISM_reactions_volume_aggregated.pkl: Serialized Eurex grouped volume data separating high/low liquidity classes, showcasing cumulative market reactions and Notional summations strictly aligned to ISM (Institute for Supply Management) economic reports. Containing the dictionary of data of individual high-frequency trades and their properties (price impact, notional and markouts) for assets in different liquidity classes (top 15 vs rest) on the Eurex exchange, on periods after macroeconomic data releases. This dictionary-like serialization retains the pre-calculated states preventing the script from repetitively burning RAM doing group-by calculations on billions of limit book actions.

📁 `Event_pickle_files/` Directory

This critical subdirectory acts as the raw persistence layer housing various .pickle files which serialize massive Python objects (EventReactions mappings) representing ultra-high-frequency trades around macro-economic events matching nanosecond temporal limits.

economic_event_reactions_100ms.pickle: The primary baseline dataset. Captures absolute market reactions logged dynamically within a bounded 100-millisecond interval surrounding target events.
Time-offset reaction files: Files such as economic_event_reactions_next_1.pickle, economic_event_reactions_prev_1.pickle, ..._7.pickle, and fractional spans like ..._0.000694...pickle capture precise shifting market conditions within strict time windows immediately before (prev) and after (next) economic events evaluating day offsets or exact minute offsets.
ism_reactions_CME.pickle / ism_reactions_CME_old.pickle: Focused serialized object records specific to CME trade reactions heavily centered around precise ISM manufacturing/services news releases and trailing US economic signals.

📄 `Overleaf_files/` Directory

This subdirectory contains the LaTeX source files and associated assets for generating the academic paper summarizing the research findings.

CME_EUREX_PMI_reactions_202511.tex: The LaTeX manuscript titled “Racing the News: Market Efficiency and First-Mover Advantage After U.S. Macroeconomic Releases at Eurex and CME”. It comprehensively synthesizes the analytical outputs, hypothesis tests, and theoretical frameworks modeled across the Jupyter notebooks.
references.bib: The BibTeX bibliography file managing all the academic citations referenced throughout the manuscript.
arxiv.sty: The LaTeX style package used to format the paper for submission to academic preprint repositories like arXiv.
Plots/: A subdirectory storing the high-resolution, publication-ready charts and figures (generated by notebooks like GIT_Final_plot_creator.ipynb) which are subsequently embedded into the LaTeX document.

Global Data Structures & Architectural Definitions

Before detailing individual notebooks, it is fundamentally crucial to understand how trades are mapped in memory. Across multiple notebooks, the code repeatedly defines high-performance @dataclass objects to handle the gigantic volume of tick-level order book records imported from pickle artifacts.

Eurex `Trade` Dataclass

The standard Eurex Trade incorporates highly specific European market structures:

t3a: Represents the raw timestamp of the action down to the nanosecond scale.
priority: Order priority logic for execution queues.
market_segment_id & security_id: Identifier maps for asset tracking.
last_px & last_qty: The final execution price and size.
markout_price & average_price: Critical elements used to calculate theoretical Profit and Loss.
CurrencyFactor(): Normalizes P&L natively across global currency indices directly inside the object (e.g. converting KRW, USD, GBP uniquely relative to EUR).

CME `Trade` Dataclass

The standard Chicago Mercantile Exchange (CME) equivalent incorporates specific mappings for US structures:

ExecID: Serves an identical chronological role as Eurex’s ‘t3a’ timing.
AvgPrice & MarkoutPrice: Essential mapping bounds for calculating execution slips.
Side: Specifies buying or selling aggression natively (AGGRESSOR_SIDE_BUY / AGGRESSOR_SIDE_SELL).
Fixed Income Logic: Unlike Eurex, the MarkoutPnl() method for CME forcefully monitors for specific Treasury bounds ('ZN', 'ZT', 'ZF', 'ZB', 'UB', 'TN') multiplying the basis by $10^{-2}$ since CME fixed-income derivatives are quoted rigorously in points rather than explicit price factors.

Both environments wrap their trades in an encompassing EventReactions dataclass, holding exactly the timestamp of the macroeconomic news, standard label tags (like “NFP” or “FOMC”), and arrays holding the preactions (before event) and reactions (after event).

1: `GIT_Distribution_difference_CME.ipynb` - Statistical Comparison on CME

Overview

This notebook is explicitly tasked with statistically analyzing whether the distribution of trades executed during the immediate $100ms$ chaos trailing a macro-release on CME differs fundamentally from the quiet periods preceding it, or differs significantly across asset classifications (Equity vs Fixed Income).

Components and Logic

Non-parametric Statistical Suite: The notebook completely bypasses naive standard-normal assumptions and imports the scipy.stats module to harness strict, heavy-tail resistant calculations:
- Kolmogorov-Smirnov (KS) Test (stats.ks_2samp): Checks if two independent empirical samples are governed by the exact same continuous distribution. It serves as the primary gauge for systemic regime changes (stochastic disruptions).
- Two-Sided Mann-Whitney U Test (stats.mannwhitneyu): Tests whether the population spaces of trade values are identical or definitively shifted.
- One-Sided Mann Whitney U Test: Mathematically tracks explicit directionality (to test if CME reaction volumes are strictly greater than pre-action volumes).
- Permutation t-Test: A nonparametric method for testing if two groups differ significantly, without assuming a normal distribution. It works by shuffling data labels thousands of times to create a null distribution, calculating a t-statistic for each, and determining the p-value by comparing the original observed statistic to this distribution.
- Bootstrap Mean Comparison: A non-parametric resampling technique used to determine if the difference between two group means is statistically significant without assuming a normal distribution. By repeatedly sampling with replacement from original data, it builds a distribution of mean differences to estimate confidence intervals and standard errors.
- Wilcoxon Signed-Rank Test (stats.wilcoxon): Measures if the calculated Markout profits post-event are statistically greater than $0$ (signifying profitable execution against toxic order flow).
Computational Validations Simulator: It implements rigorous randomized Bootstrapping (bootstrap_mean_comparison) traversing $5000$ loop variants mixing the post-event and pre-event distributions randomly to natively test if the mean of the arrays surpasses standard deviations naturally, as well as a pure Permutation T-Test doing identical loops for absolute independence verification.
Comparison Execution Modes: The ultimate loop within the notebook reads the preprocessed CSV tables (CME_processed_individual_data.csv) natively and runs the aforementioned statistical gauntlet spanning cross-comparisons against:
- CME vs Eurex cross-exchange structural differences across the 30-100 millisecond window.
- CME Equities vs Non-Equities (Fixed Income Treasury Bonds) measuring whether fast-money targets specific asset groups faster (Comparing ranges of $0 \mu s$ to $200 \mu s$, $200 \mu s$ to $30 ms$, and $30 ms$ to $100 ms$).

All test statistics and generated p-values are exported continuously into an aggressively appended table named Distributional_difference.csv.

2: `GIT_Distribution_difference_Eurex.ipynb` - Statistical Comparison on EUREX

Overview

Functionally identical in mathematical premise to the CME variant, this notebook redirects the statistical frameworks strictly towards European data utilizing .pickle caches mapping the fractional reaction periods.

Components and Logic

Granular Time Window Shifting: While the CME notebook focuses heavily on Equities vs Bonds, the Eurex logic dives heavily into sequential time shifts using explicit files representing precise windows:
- economic_event_reactions_100ms.pickle (The baseline window).
- economic_event_reactions_next_1.pickle vs economic_event_reactions_prev_1.pickle (Analyzing exactly $\pm 1$ Day).
- .pickle objects ending in increments matching $0.0006944444…$ representing explicit normalized $\pm 1$ Minute fractionals in Pandas daytime architectures.
Bulk Evaluation Pipeline (bulk_compute): The script constructs heavy array lists parsing nested events targeting purely independent global US macros like ISM and NFP. It accumulates individual trade PnLs (MarkoutPnl_10s) and Notional values across thousands of reactions and pipes them directly through the conduct_all_tests() engine defined identically as in the CME script.

3: `GIT_pickle_analysis.ipynb` - Raw Event Memory & Microsecond Latency Visualization

Overview

Considered the core exploratory visualizer for deep raw memory, this notebook pulls apart the raw bytes embedded inside the 100ms.pickle dictionaries and generates massive, highly technical scatter plot layers mapping how many nanoseconds it took for the matching engines to record trades following external news publications.

Components and Logic

Hexagonal Color Engineering: The setup relies on manual Matplotlib Line2D and PathEffects parameters bounding color schemas specific to exact macroeconomic announcements:
- ISM MANUFACTURING = Blue
- ISM SERVICES = Light Blue
- NFP (Non-Farm Payroll) = Green
- FOMC = Orange
CME Array Visualization (plot_cme):
- The plot is constructed using an immense dual $1 \times 2$ vertical grid mapping $t_{gateway} \longrightarrow$ event versus event $\longrightarrow t_{gateway}$.
- The Y-axis natively embraces a semilogy strict mathematical boundary, shifting bounds rapidly from 0 \mu s down through bounds measuring single-digit nanoseconds.
- Grey alpha=0.25 shaded visual bounds physically demarcate explicit latency speed upgrades implemented historically on the CME engines (signaled by changes on dates surrounding mid-2021 and 2024).
Eurex Array Visualization (plot_Eurex):
- Generates mirroring twin-charts specifically for Eurex gateways.
- Leverages bisect_left natively to sort temporal bounds across the X-axis tracking distinct years (2020 through 2025).
- Accurately captures extreme boundary events logging minimal latency spikes natively appending a stylized dynamic arrow bounding and pointing mathematically towards the single fastest tick recorded in the set (e.g. tracking down to $\approx 3$ nanoseconds).
Product PnL Splines:
- Includes a cumulative trailing Markout plot tracking Latency ( $\mu s$ ) over Cumulative Markout PnL explicitly tracking individual products dynamically iterating over all subsets natively within the .pickle file framework.

4: `GIT_Final_plot_creator.ipynb` - Unified Visual Aggregations & Cross-Exchange Comparisons

Overview

This notebook acts as the centralized report generator, combining trades from both Eurex and CME simultaneously and processing them through synchronized binning functions to build comprehensive publication-ready multi-axis plots.

Components and Logic

Precision Time Binning (bin_trades_CME / bin_trades_EUREX): Order execution counts and financial calculations are forced into tight localized segments:
- Uniform Microsecond Bins: Trades are grouped explicitly into bounds tracking exactly $10 \mu s$ ($10,000$ nanosecond spans running to $100$ milliseconds).
- Logarithmic Time Bins: A massive exponential funnel [0, 1000, 10000, 100000, 1000000, 10000000, 100000000] mapping raw processing delay curves.
Core Plot Architecture (count_reactions):
- Dynamically charts the summation of reactions plotted against Latency intervals (0 to 100 ms) using deep gray plots punctuated with black markers.
- Displays Subplot (A) specific to CME reaction counts, directly merged with Subplot (B) handling Eurex counts.
- Log scale Y-axis bounds stretch constraints capturing massive early-burst trades hitting the 10-15k tick ranges in mere fractions of a millisecond.
Cumulative Flow Plots (plot_cumulative_PnL):
- Generates four cohesive charts mapping the total monetary Notional velocity and cumulative $10s$ Markout PnL extracted uniformly via loops counting sequential sums np.cumsum(array).
- Divides totals down to highly legible scales, utilizing (K) modifiers for PnL ($10^3$) and (M) modifiers for Notional amounts ($10^6$).
Trade Posture Positioning Layout:
- Identifies the explicit order ranking (Position 1st, 2nd, … 50th) of executions immediately triggered upon event release for all asset traded at Eurex and CME giving us a distribution of P&L and Notional at each position.
- Uses ax.fill_between graphing functions interpolating the 5th and 95th Percentile bounds surrounding the Mean performance tracking exactly how toxic initial positions react compared to delayed trailing orders for both exchanges.

5: `GIT_Price_change_analysis_EUREX_volume.ipynb` - Eurex Latency Notional Analytics

Overview

Focusing strictly on orderbook saturation and bandwidth constraint visualizations, this script dissects cumulative European volume behavior across extreme ultra-low latency checkpoints utilizing a specially optimized cache (EUREX_ISM_reactions_volume_aggregated.pkl).

Components and Logic

Volume Caching Ecosystem: Grouping multi-gigabyte files linearly takes immense time. The script operates off a serialized cache that pre-grouped high-liquidity assets versus low-liquidity assets mapping PriceChangeList, NotionalList, and MarkoutList. If the cache is absent, the script smartly defaults to an empty initialization structure dynamically awaiting hydration.
Dashed Latency Demarcations (plot_Notional_PnL_evolution): The unique value of this notebook is mapping the Normalized Cumulative Notional trajectory utilizing a stepwise line approach (where='post').
- A critical dashed boundary wall sits aggressively at exactly $907$ nanoseconds ($907 \mu s$). This highlights physical co-location advantages, representing the literal fiber optic limitations required for direct on-server interactions.
- The second major boundary exists identically at $37$ milliseconds. This marks the well-established latency hurdle representing cross-Atlantic data cable transmissions representing exactly when US macro algorithms hit European order books.
- The axes gracefully utilize a symlog (Symmetrical Log) configuration allowing negative millisecond visualization seamlessly traversing zero into positive post-event bounds flawlessly keeping the $X$-axis linearly legible across the zero bound.

6: `GIT_PriceFormation_PnL_Analytics.ipynb` - PnL Correlations & Predictive Modeling

Overview

A machine-learning and rigorous statistical analysis framework operating entirely on pre-compiled tabular outputs. This notebook seeks to quantify relationships sequentially mapping price impacts from atomic microsecond shifts straight into macro effects and forecasting subsequent momentum shifts dynamically.

Components and Logic

Sequential Pearson, Spearman, & Kendall Mapping: Using data subsets exclusively bound to high-momentum changes (where Absolute Basis Point change is manually locked to abs() >= 0.25), this script uses core Pandas corr() functions to output tables assessing:
- Does a disruption in $0 - 200 \mu s$ guarantee an identical velocity directionally in $200 \mu s - 30 ms$?
- Applies the tests redundantly using pearson (linear), spearman (monotonic ranks), and kendall (tau ranking indices) matrices.
- Employs deep distribution testing across intervals via Nonparametric kruskal implementations and heavy variance testing via friedmanchisquare imports detecting shifts mathematically.
Skew Normal Probabilities (ss.skewnorm): Transforms standard frequency histograms mapped out via Seaborn (sns.histplot) utilizing multi-hue shaded KDE plots into fitted theoretical Skew Normal mappings tracking Probability Density frequencies (PDF). It graphs these overlays explicitly to verify non-normal distributions inherent in toxic HFT event trading.
Predictive Analytics Ecosystem (Machine Learning): The climax of the script introduces a predictive supervised-learning pipeline engineered purely via sklearn attempting to guess the resulting basis point outcome for bounds $200 \mu s - 100 ms$ purely based upon the absolute initial shock happening inside $0 - 200 \mu s$.
- Data Split: Utilizes train_test_split mapping datasets 60-20-20 (Training, Validation, Testing boundaries).
- Random Forest Framework (RandomForestRegressor): Natively initializes an ensemble configuration processing 20 massive categorical trees (n_estimators=20) executing strictly utilizing true Out-Of-Bag (oob_score=True) error tracking checking continuous validity against overfitting structures.
- Standard Ordinary Least Squares (OLS) Linear Framework: Employs deep regression utilizing a completely custom linreg_summary() function dissecting deep standard errors, R², F-statistics, and strictly checking $p(B_1)$ bounds identifying algorithmic reliability against noise.
- Finally, renders a beautifully complex dual multi-axis chart predicting testing structures overlaying Green standard Random Forest predictive fits securely juxtaposed alongside Black normal linear arrays tracking the $x/y$ scatter planes.

Prerequisites and Environment Setup

To recreate and analyze the outputs generated globally across this entire Jupyter ecosystem seamlessly, identical architectures must exist locally:

Language Requirements

Base requirement natively bounds to Python 3.7+ given the absolute necessity of rigorous typing arguments, native dataclass inheritance (@dataclass(order=True)), and deep internal mathematical structures.

Library Requirements

Core packages must be configured locally:

pandas / numpy: Matrix and mathematical bounding frameworks.
scipy: Fundamentally powers all underlying statistical KS-tests, MWU-tests, Wilcoxon equations natively alongside standard skew correlations.
matplotlib / seaborn: The primary engine controlling multi grid visuals, log transformations, and path effects manipulating hex colors natively.
scikit-learn: Exclusively required for executing Random Forest decision grids within Notebook 6 safely processing split arrays and categorical metrics properly.

Expected Directory Data Layers

Due to hardcoded paths across all $6$ files, a core directory structure referencing an explicit folder named /Event_pickle_files directly inside the primary domain must exist hosting all iterations of raw memory arrays (e.g. ism_reactions_CME.pickle, economic_event_reactions_100ms.pickle).

Furthermore, strictly structured outputs like CME_processed_individual_data.csv and EUREX_ISM_reactions_volume_aggregated.pkl must comfortably execute adjacent to the .ipynb documents explicitly controlling paths linearly ensuring immediate loading efficiency natively maintaining functional dependency trees safely avoiding pathing exceptions completely.

Conclusion and Execution Workflows

Together, these notebooks offer an incredibly detailed framework dedicated strictly toward dissecting the intricate architecture mapping physical latency speed and topological advantages strictly towards financial realization parameters mapping event reaction times against order book saturation linearly.

Analysts aiming to adopt this structure specifically should:

Guarantee data synchronization starting cleanly from the processed CSVs mapping notebook (6) verifying standard correlations completely.
Launch deep structural .pickle validations inside notebooks (3) and (5) targeting high-level visualizations checking system normalization and hardware delays explicitly targeting latency benchmarks (907 nanoseconds vs 37ms curves).
Validate entire systematic regime disruptions across notebooks (1) and (2) relying heavily upon exhaustive Non-parametric mathematical algorithms determining stochastic shift confidence explicitly.
Render the definitive multi-layered publication plots inside notebook (4) finalizing multi-exchange aggregation and final total cumulative evaluations flawlessly merging global US arrays with Eurex variants.

automated-trading-systems

US_MacroRelease_HPT_Analysis

Table of Contents

Executive Summary

Directory Overview

📓 Jupyter Notebooks

📊 Data Files (Root Directory)

📁 Event_pickle_files/ Directory

📄 Overleaf_files/ Directory

Global Data Structures & Architectural Definitions

Eurex Trade Dataclass

CME Trade Dataclass

1: GIT_Distribution_difference_CME.ipynb - Statistical Comparison on CME

Overview

Components and Logic

2: GIT_Distribution_difference_Eurex.ipynb - Statistical Comparison on EUREX

Overview

Components and Logic

3: GIT_pickle_analysis.ipynb - Raw Event Memory & Microsecond Latency Visualization

Overview

Components and Logic

4: GIT_Final_plot_creator.ipynb - Unified Visual Aggregations & Cross-Exchange Comparisons

Overview

Components and Logic

5: GIT_Price_change_analysis_EUREX_volume.ipynb - Eurex Latency Notional Analytics

Overview

Components and Logic

6: GIT_PriceFormation_PnL_Analytics.ipynb - PnL Correlations & Predictive Modeling

Overview

Components and Logic

Prerequisites and Environment Setup

Language Requirements

Library Requirements

Expected Directory Data Layers

Conclusion and Execution Workflows

📁 `Event_pickle_files/` Directory

📄 `Overleaf_files/` Directory

Eurex `Trade` Dataclass

CME `Trade` Dataclass

1: `GIT_Distribution_difference_CME.ipynb` - Statistical Comparison on CME

2: `GIT_Distribution_difference_Eurex.ipynb` - Statistical Comparison on EUREX

3: `GIT_pickle_analysis.ipynb` - Raw Event Memory & Microsecond Latency Visualization

4: `GIT_Final_plot_creator.ipynb` - Unified Visual Aggregations & Cross-Exchange Comparisons

5: `GIT_Price_change_analysis_EUREX_volume.ipynb` - Eurex Latency Notional Analytics

6: `GIT_PriceFormation_PnL_Analytics.ipynb` - PnL Correlations & Predictive Modeling