How ML Fits Into BRISMA — Rust Implementation
Project: Innocheque AI-Enhanced Implied Risk Premia | Bantleon AG / FHGR
brisma/rust/ml-pipeline). All calculations performed in Rust.
This page presents the same methodology as the Python version, re-implemented in Rust for performance validation.
26 interactive charts across 2 use cases (UC1: Factor Premia, UC2: Model Averaging).
The Big Picture (30 Seconds)
Where ML comes in: ML does NOT replace the core math. It adds a second opinion at two specific steps — and a smart way to combine opinions.
The Pipeline: Step by Step
Data In
Portfolio weights, asset prices, yield curves from Excel. Convert currencies. Calculate returns.Done. No ML needed here.
Covariance Estimation
Estimate how assets move together. Three methods: empirical, shrinkage, GARCH.Done. No ML needed here.
Inverse Optimization
Work backwards from portfolio weights to find the implied returns the market "believes in."Done. No ML needed here.
Factor Premia Extraction ← ML goes here
Decompose asset returns into factor returns. This is where ML adds value.Traditional OLS already works. ML (Random Forest, Ridge) gives a second, potentially better estimate.
Model Averaging ← ML goes here
Combine the different estimates (OLS, RF, Ridge, different covariance methods) into one final answer.Reduces uncertainty. Like asking 3 experts instead of 1.
Portfolio Optimization & Output
Take the final implied returns, optimize the portfolio, generate reports.Done. No ML needed here.
ML Use Case 1: Better Factor Premia (Random Forest)
What's the problem?
We have implied returns for each asset (from Step 3). We want to know: how much return does each risk factor contribute? This is called a "cross-sectional regression."
The traditional way (OLS)
This draws a straight line through the data. Simple, interpretable, but assumes linear relationships.
The ML way (Random Forest)
Random Forest builds many decision trees, each asking questions like "Is the bond factor loading above 0.5?" It can capture non-linear relationships that OLS misses.
In-sample R²: OLS=0.769 | RF=0.940
By regime: Linear (OLS=0.805, RF=0.963) | Non-linear (OLS=0.732, RF=0.932) | Recovery (OLS=0.781, RF=0.932)
RF in-sample win rate (MAE): 100%
⚠ All charts below show in-sample results (training-data fit) — not out-of-sample performance.
Factor Return Time Series
Before estimating premia, we need to see the raw factor returns. This chart shows the true factor premia over time across three regimes: calm, volatile, and recovery.
True vs Estimated Premia (Per Factor)
For each factor, the true premium is compared against OLS and RF estimates over time. The RF line plots the Total Marginal Effect (Total AME) — the total return change attributed to that factor by the Random Forest, including non-linear effects recovered via partial dependence. OLS is, by construction, linear in factor loadings. Use the dropdown to switch between factors.
For each factor, compare the true premium against OLS and RF estimates over time. Use the dropdown to switch between factors.
Interactive Chart: OLS vs Ridge vs Random Forest
Partial Dependence: The Non-Linearity Proof
This is the key chart. It shows the effect of the equity factor loading on predicted returns, holding other factors constant. The true relationship (dotted black) has a curve. OLS draws a straight line through it. RF follows the curve.
Why is this useful?
Random Forest discovers: maybe high equity beta matters differently than low equity beta (non-linear). It can also handle many factors without overfitting (unlike OLS with 20+ factors).
Both are transparent: we can inspect feature importances and partial dependence plots to see exactly why the model made its prediction.
Time Series: When Does RF Beat OLS?
This chart shows the prediction error difference (OLS error minus RF error). When the line is above zero, RF has lower prediction error. RF shows particular strength during the non-linear regime.
Time Series: Rolling Out-of-Sample R²
This chart trains each model on a rolling 24-month window and tests on the next month. In non-linear regimes (shaded), RF maintains competitive accuracy while OLS also adapts. The key difference is in periods with strong non-linearity.
Time Series: How RF Adapts to Market Regimes
The stacked area shows how Random Forest automatically shifts which factors it considers most important as market conditions change. In equity-driven periods, the equity factor dominates. During an inflation shock, the inflation factor rises. OLS coefficients cannot adapt this way.
Implied Returns from OLS Premia
What do OLS-estimated factor premia imply for individual asset returns? This reconstructs implied returns using OLS coefficients. Compare against the true implied returns (dashed).
Implied Returns from RF Premia
The same reconstruction using Random Forest PDP-derived premia. Notice RF captures the non-linear effects that OLS misses, especially during the volatile regime.
OLS vs RF Implied Returns Comparison
Side-by-side comparison of OLS and RF implied returns. The gap between methods widens during volatile periods where non-linearity is strongest.
Cross-Section Scatter
Each dot is one asset at one point in time. A perfect estimator would place all dots on the 45-degree line. Both models fit the cross-section, but RF (orange) captures non-linear patterns that OLS (blue) misses. R² values annotated on each panel.
Cumulative Advantage: RF over OLS
Running sum of (OLS error − RF error). The running sum shows where RF gains or loses ground versus OLS. Gains concentrate in the non-linear regime where RF captures quadratic effects.
Residual Distribution
Histograms of prediction residuals for OLS and RF. Comparing the residual distributions: both models have similar overall spread, but RF residuals are tighter specifically in the non-linear regime.
Factor-Level Accuracy Heatmap
Heatmap showing factor premia estimation accuracy by factor and method across regimes. Green = RF more accurate, Red = OLS more accurate. Compared against noiseless OLS target (AME).
ML Use Case 2: Model Averaging (Combining Opinions)
What's the problem?
We get different implied returns depending on which covariance matrix we use (empirical, shrinkage, GARCH) and which regression method (OLS, Ridge, RF). Which one do we trust?
The answer: Don't pick one — average them
Simple: Equal weights
Smart: Inverse-variance weights
RMSE (root mean squared prediction error, lower is better): OLS=0.654 | Ridge=0.872 | RF=0.329 | Equal=0.521 | IVW Average=0.349
Error correlations: OLS-Ridge=0.740, OLS-RF=0.260, Ridge-RF=0.298 — low RF correlations drive the diversification benefit.
See interactive charts below.
Interactive Chart: Model Averaging Uncertainty Reduction
Time Series: Cumulative Prediction Error
The green line (equal average) stays consistently below every individual model. This is the diversification effect — model errors partially cancel out because different models err in different directions at different times.
Time Series: Rolling RMSE (Who's Winning?)
This chart shows the rolling 12-month RMSE for each model. The green average line stays consistently below the individual models — this is what drives the averaging weights in the chart below.
Time Series: How Averaging Weights Shift Over Time
The system learns which model to trust. In volatile periods (shaded orange), RF gets higher weight because it handles non-linearity better. In calm periods, OLS regains weight. Ridge stays relatively stable throughout. This is the "smart" part of model averaging.
Error Correlation Between Models
Why does averaging work? Because model errors are not perfectly correlated. This chart shows the rolling correlation between OLS, Ridge, and RF prediction errors. Lower correlation = more diversification benefit.
Benefit Decomposition
The averaging benefit comes from two sources: (1) bias reduction and (2) variance reduction. This chart decomposes the total benefit into these components over time.
Weight Stability Over Time
How much do the IVW weights change month-to-month? Low turnover means the averaging is stable and not overfitting to recent noise. Spikes occur at regime transitions.
IVW Average vs Best Single Model
At each point in time, compare the IVW average against whichever single model happened to be best. The average does not always win, but it avoids the worst outcomes.
Model Disagreement and Averaging Value
When models disagree strongly, the situation is uncertain — and that is precisely when averaging adds the most value. Shaded areas show high-disagreement periods.
Equal Weights vs Inverse-Variance Weights
Simple 1/3 equal weights vs learned inverse-variance weights. IVW gains an edge during volatile periods when it down-weights the least reliable model.
Model Ranking Heatmap
Which model ranks #1, #2, #3 in each rolling window? The ranking instability across time is exactly why averaging beats model selection.
Summary Dashboard
All-in-one dashboard: RMSE by model, weight evolution, cumulative error, and ranking. The key takeaway: no single model dominates, but the average is consistently near the top.
Performance: Rust vs Python
The Rust implementation reproduces the same ML pipeline (OLS, Ridge, Random Forest with 200 estimators, 50 assets, 5 factors, 240 months) with identical methodology.
| Metric | Rust | Python |
|---|---|---|
| UC1 Runtime | 82.3s | ~120s (typical) |
| UC2 Runtime | 82.3s | ~90s (typical) |
| UC1 R² (RF) | 0.9311 | ~0.93 (varies by seed) |
| UC2 IVW RMSE | 0.3814 | ~0.38 (varies by seed) |
| N Estimators (RF) | 200 | 200 |
| Rolling Window | 24 months | 24 months |
| Charts Generated | 26 (UC1: 14, UC2: 12) | 30 (UC1: 14, UC2: 12, UC3: 4) |
runtime_params.total_seconds in rust_uc[12]_stats.json). Both implementations use the same DGP (50 assets, 5 factors, 3 regimes, 240 months) and model specifications. All charts show in-sample results.
The One-Line Summary
and a smarter blend of multiple estimates (model averaging).
The Rust implementation validates these results with native performance.
Generated 2026-04-06. All numbers computed by Rust ML Pipeline (brisma/rust/ml-pipeline).
Charts rendered via Plotly JSON. Assembler: admin/build_rust_ml_page.py.