Explainable Regime-Aware Portfolio Optimization: The Case for Robust Rolling Regime Detection

Amine Boukardagha

Abstract

We propose Robust Rolling Regime Detection (R2-RD), an explainable framework for cross-asset portfolio optimization under time-varying market regimes. R2-RD combines expanding-window Hidden Markov Model (HMM) estimation with three key innovations: (1) dynamic regime count selection via the Bayesian Information Criterion, (2) a regime emergence policy that ensures monotonically non-decreasing regime counts for temporal stability, and (3) a label matching mechanism based on the Hungarian algorithm that resolves the label switching problem. We establish theoretical foundations including MLE consistency, BIC consistency for regime selection, and conditions under which regime-aware mean-variance optimization dominates unconditional approaches. Empirically, we evaluate R2-RD against a K-Nearest Neighbors benchmark on a diversified cross-asset universe over 2016–2024. R2-RD achieves a Sharpe ratio of 0.93 versus 0.73 for KNN, with maximum drawdown reduced from 30.44\

Introduction

Financial markets exhibit persistent structural changes driven by macroeconomic cycles, monetary policy shifts, and evolving risk appetites. The 2008 global financial crisis, the European debt crisis, and the COVID-19 pandemic vividly illustrate how market dynamics can shift abruptly between distinct regimes characterized by markedly different return distributions and correlation structures (Brunnermeier, 2009; Billio, 2012). Ignoring such regime dynamics leads to unstable portfolio allocations, poor risk-adjusted performance, and unexpected drawdowns during market dislocations.

Regime-aware portfolio construction addresses this challenge by conditioning expected returns and covariances on latent market states. Since the seminal work of (Hamilton, 1989), regime-switching models have become a cornerstone of financial econometrics, with Hidden Markov Models (HMMs) emerging as the dominant framework for capturing unobservable market states (Ryden, 1998; Kim, 1999). The appeal of HMMs lies in their ability to model both the persistence of market regimes through transition probabilities and the distinct statistical properties of each regime through state-dependent emission distributions.

Despite their theoretical appeal, applying HMMs to portfolio optimization in practice presents several challenges. First, the number of regimes is typically unknown and may evolve over time as new market structures emerge. Second, rolling or expanding-window estimation introduces the label switching problem, where regime labels may permute arbitrarily across estimation windows, destroying the temporal consistency needed for coherent portfolio decisions (Jakobsson, 2007). Third, the computational burden of model selection and parameter estimation can be substantial, particularly when the regime count itself is a parameter to be optimized.

This paper proposes Robust Rolling Regime Detection (R2-RD), a framework that addresses these challenges through three key innovations. Building on (Hirsa, 2024), we develop an expanding-window HMM estimation procedure with dynamic regime count selection via the Bayesian Information Criterion (BIC). Crucially, we introduce a regime emergence policy that constrains the optimization to permit only regime addition, never removal, reflecting the empirical observation that market regimes tend to fragment over time rather than merge. We complement this with a label matching mechanism based on the linear assignment problem that ensures temporal consistency of regime labels across estimation windows.

To benchmark R2-RD, we compare it against a K-Nearest Neighbors (KNN) approach that approximates regimes locally without explicit modeling of latent states. While KNN adapts rapidly to changing conditions through its nonparametric structure, it lacks the temporal smoothing and interpretability that parametric approaches provide. We embed both methods within an identical mean–variance optimization (MVO) framework with turnover regularization, enabling a clean comparison of parametric versus nonparametric regime detection for cross-asset allocation.

Our contributions are as follows:

We propose R2-RD, a robust framework for rolling regime detection that combines expanding-window HMM estimation with dynamic BIC-based regime count selection and a regime emergence policy that preserves previously identified market states.
We introduce a label matching mechanism based on the Hungarian algorithm that ensures temporal consistency of regime assignments across estimation windows, resolving the label switching problem in rolling HMM applications.
We develop theoretical results establishing conditions under which regime-aware mean–variance optimization outperforms unconditional approaches, including bounds on the growth rate of the regime count and consistency properties of the BIC-selected model.
We provide an extensive empirical analysis on a diversified cross-asset universe from 2016 to present, demonstrating that R2-RD achieves a Sharpe ratio of 0.93 compared to 0.73 for KNN, with maximum drawdown reduced from 30.44\
We emphasize the explainability of R2-RD: each regime is characterized by interpretable mean vectors and covariance matrices that map directly to observable market conditions, providing practitioners with a transparent rationale for portfolio decisions.

The remainder of this paper is organized as follows. Section reviews the related literature on regime-switching models, portfolio optimization under uncertainty, and explainable methods in finance. Section describes the data and asset universe. Section presents the R2-RD methodology and KNN benchmark in detail. Section develops the theoretical foundations. Section formalizes the portfolio optimization problem. Section presents empirical results, and Section discusses implications and limitations. Section concludes.

Literature Review

This section surveys the relevant literature across four streams: regime-switching models in finance, portfolio optimization under regime uncertainty, the label switching problem and its solutions, and explainability in quantitative finance.

Regime-Switching Models in Finance

The econometric analysis of regime-switching models began with the seminal work of (Hamilton, 1989), who introduced a Markov-switching framework for modeling business cycles. Hamilton's approach models the mean growth rate of GDP as following a two-state Markov process, with transitions between expansion and recession states governed by constant probabilities. This framework was subsequently extended to financial markets, where regime-switching behavior is even more pronounced.

(Ryden, 1998) applied Hidden Markov Models to daily stock returns, demonstrating that HMMs can capture the stylized facts of financial returns–-including volatility clustering, fat tails, and autocorrelation in squared returns–-more effectively than single-regime models. Their work established HMMs as a viable alternative to GARCH-type models for capturing time-varying volatility.

A parallel literature developed around Markov-switching GARCH models, which combine regime-switching dynamics with autoregressive conditional heteroskedasticity. (Gray, 1996) introduced a regime-switching model for interest rates where the conditional variance follows a GARCH process with regime-dependent parameters. (Haas, 2004) proposed a more tractable formulation that avoids the path-dependence problem inherent in earlier specifications. More recently, (Caporale, 2019) applied Markov-switching GARCH to cryptocurrency markets, finding evidence of distinct volatility regimes corresponding to different market conditions.

The theoretical foundations for maximum likelihood estimation in HMMs were established by (Leroux, 1992), who proved consistency of the MLE under regularity conditions including ergodicity and identifiability of the hidden Markov chain. These results provide the asymptotic justification for our expanding-window estimation approach.

Portfolio Optimization Under Regime Uncertainty

The application of regime-switching models to portfolio allocation was pioneered by (Ang, 2002), who studied international asset allocation when returns exhibit regime-dependent correlations. They found that accounting for regime shifts substantially affects optimal portfolio weights, particularly during crisis periods when correlations tend to increase.

(Guidolin, 2007) extended this framework to multivariate regime switching with multiple asset classes, developing a dynamic programming solution for the investor's problem. Their work demonstrated that regime-aware portfolios can achieve significant improvements in out-of-sample performance relative to unconditional strategies, particularly in terms of drawdown control during market dislocations.

An important benchmark in this literature is the work of (DeMiguel, 2009), who showed that the simple 1/N equal-weight portfolio often outperforms sophisticated optimization-based strategies out of sample. This finding highlights the importance of estimation error in portfolio optimization and motivates the use of shrinkage estimators and regularization. (Ledoit, 2004) addressed this challenge by proposing shrinkage estimators for the covariance matrix that substantially reduce estimation error.

The role of transaction costs in dynamic portfolio optimization was formalized by (Garleanu, 2013), who derived closed-form solutions for optimal trading with predictable returns and quadratic transaction costs. Their framework shows that optimal portfolios exhibit inertia, trading toward a "target" portfolio at a rate that balances the costs of deviating from the optimum against trading costs.

The Label Switching Problem

A fundamental challenge in rolling or sequential estimation of mixture models and HMMs is the label switching problem: the likelihood function is invariant to permutations of the component labels, so estimates from different time periods may use inconsistent labeling of regimes (Jakobsson, 2007). This problem is particularly acute in financial applications where the interpretation of regimes (e.g., "crisis" vs. "normal") is economically meaningful.

Several solutions have been proposed. Post-processing approaches relabel the output of MCMC or EM algorithms to achieve consistency, typically by solving an assignment problem that maximizes overlap between successive estimates. (Jakobsson, 2007) developed the CLUMPP algorithm for this purpose in population genetics, which we adapt to our financial context. The assignment problem itself is solved efficiently using the Hungarian algorithm (Kuhn, 1955).

An alternative approach is to impose identifying restrictions during estimation, such as ordering constraints on regime means or transition probabilities. However, these constraints may be violated in practice and can distort inference. Our label matching approach avoids these issues by permitting unrestricted estimation followed by optimal relabeling.

Model Selection for Hidden Markov Models

Selecting the number of hidden states is a critical step in HMM specification. Information criteria, particularly the Bayesian Information Criterion (BIC) of (Schwarz, 1978), are widely used for this purpose. The BIC penalizes model complexity more heavily than the Akaike Information Criterion (AIC), leading to more parsimonious models that tend to generalize better out of sample.

For mixture models and HMMs, the BIC has been shown to be consistent for selecting the true number of components under regularity conditions, meaning that as the sample size grows, the probability of selecting the correct model approaches one. However, the finite-sample performance of the BIC depends on the separation between components and the relative frequencies of different regimes.

Our regime emergence policy, which constrains the minimum number of regimes to be non-decreasing over time, addresses a practical limitation of standard model selection: in rolling or expanding-window estimation, the selected model can fluctuate due to sampling variability, leading to spurious regime merging and splitting. By permitting only regime emergence, we ensure that previously identified market states remain in the model, providing a more stable foundation for portfolio decisions.

Explainability in Quantitative Finance

The increasing use of machine learning in finance has raised concerns about model interpretability and the "black box" nature of complex algorithms (Gu, 2020). Regulatory requirements, including the European Union's General Data Protection Regulation (GDPR), mandate that automated decisions affecting individuals be explainable.

In the context of portfolio management, explainability is valuable for multiple reasons: it enables risk managers to understand the drivers of portfolio positions, facilitates communication with clients and regulators, and helps identify when models may be behaving anomalously. (Harvey, 2016) emphasized the importance of economic intuition in evaluating quantitative strategies, arguing that factors without clear economic rationale are more likely to be spurious.

Hidden Markov Models offer a natural form of explainability in this context. Each regime is characterized by a multivariate Gaussian distribution with interpretable mean vector μ_k and covariance matrix Σ_k. These parameters can be mapped directly to economic conditions: a "crisis" regime might exhibit negative expected returns, elevated volatilities, and increased correlations across risky assets. The probability of being in each regime, π_t^(k), provides a transparent weighting scheme that connects observed market conditions to portfolio decisions.

This interpretability distinguishes HMM-based approaches from purely nonparametric methods like KNN, which adapt to local conditions without providing explicit characterizations of different market states. While KNN may capture similar patterns implicitly, it lacks the parametric structure that facilitates economic interpretation and communication.

Data and Market Universe

The empirical analysis is conducted on a diversified cross-asset universe designed to capture major global risk factors. Monthly log-returns are constructed from adjusted close prices obtained via Yahoo Finance. The sample period spans January 2000 through December 2024, with the first 16 years (2000–2015) used for initial model training and the remaining period (2016–2024) reserved for out-of-sample evaluation. The asset universe consists of:

Equities: SPDR S&P 500 ETF Trust (SPY), representing U.S. equity market risk;
Fixed Income: iShares 7–10 Year Treasury ETF (IEF), proxying interest rate and duration risk;
Commodities: SPDR Gold Shares (GLD) and United States Oil Fund (USO), capturing inflation sensitivity and real asset exposure;
Foreign Exchange: Invesco DB US Dollar Index Bullish Fund (UUP), representing global risk-off dynamics and dollar strength.

All assets are exchange-traded funds (ETFs) with sufficient liquidity and history for robust backtesting. Returns are computed as log-differences of adjusted closing prices to account for dividends and splits.

Methodology

This section presents the R2-RD methodology in detail, including the feature representation, HMM specification, dynamic regime count selection, and label matching mechanism. We also describe the KNN benchmark that serves as a nonparametric comparison.

Feature Representation

Both regime detection approaches operate on a shared feature space constructed from asset returns and risk measures. At each month t, the feature vector is defined as:

x_t = [r_t; σ_t], σ_t = sqrt(1)6 Sum[i=1 to 6] (r_t-i - r_t)²,

where r_t ∈ R^N denotes the vector of monthly log-returns for N assets and σ_t ∈ R^N is a six-month rolling volatility estimate for each asset. The combined feature vector x_t ∈ R^2N captures both directional information (through returns) and prevailing market uncertainty (through volatility), which are the key drivers of regime differentiation.

Prior to regime detection, features are standardized to have zero mean and unit variance over the estimation window, ensuring that all dimensions contribute equally to the distance metrics used in both HMM emission probabilities and KNN neighbor selection.

Robust Rolling Regime Detection (R2-RD)

Hidden Markov Model Specification

Let z_t ∈ \1, , K\ denote the latent market regime at time t. We model the regime sequence as a first-order Markov chain with transition probability matrix A ∈ R^{K × K}:

P(z_t = j | z_t-1 = i) = A_ij, Sum[j=1 to K] A_ij = 1.

Conditional on the regime, feature vectors follow a multivariate Gaussian distribution:

x_t | z_t = k N(μ_k, Σ_k),

where μ_k ∈ R^2N and Σ_k ∈ R^{2N × 2N} are the regime-specific mean vector and covariance matrix. The Gaussian assumption is standard in financial applications and provides a tractable likelihood function while accommodating regime-dependent means, variances, and correlations.

The complete parameter set is θ = π, A, μ_k, Σ_k\_k=1^K\, where π denotes the initial state distribution. Parameters are estimated via the Expectation-Maximization (EM) algorithm, specifically the Baum-Welch algorithm, which iteratively updates parameter estimates to maximize the observed data likelihood.

EM Algorithm Specification. We initialize the EM algorithm using K-means++ clustering on the observation space, which provides a principled starting point that tends to separate distinct market states. To mitigate sensitivity to initialization, we perform 10 random restarts and select the solution with the highest log-likelihood. Convergence is declared when the relative change in log-likelihood falls below 10^-6 or after 200 iterations, whichever occurs first.

To ensure numerical stability and well-conditioned covariance matrices, we apply Ledoit-Wolf shrinkage (Ledoit, 2004) to the regime-specific covariance estimates:

Σ_k^shrunk = (1-α) Σ_k + α · diag(Σ_k),

where α ∈ [0,1] is the shrinkage intensity, chosen to minimize expected loss.

Expanding-Window Estimation with Dynamic Regime Count

At each rebalancing date t, we estimate HMMs using an expanding window comprising all observations from the sample start through t-1. This expanding-window approach ensures that all available information is utilized while maintaining strict temporal causality–-no future information is used in any estimation step.

For regime count selection, we fit candidate HMMs for K ∈ _,t, , K\ and select the optimal number of regimes via the Bayesian Information Criterion:

K_t^* = argmin[K ∈ _{,t], , K_}\ BIC_K,

where

BIC_K = -2 log L_K + p_K log T.

Here L_K is the maximized likelihood under the K-regime model, p_K is the number of free parameters, and T is the sample size.

For a K-regime HMM with d-dimensional Gaussian emissions, the parameter count is:

p_K = (K-1) + K(K-1) + K [ d + (d(d+1))/(2) ],

where the three terms correspond to the initial distribution, transition matrix, and emission parameters (means and covariance matrices), respectively.

Regime Emergence Policy

A key innovation of R2-RD is the regime emergence policy, which constrains the lower bound of the regime count search:

K_,t = K_t-1^*.

This constraint ensures that the number of regimes is monotonically non-decreasing over time: once a regime has been identified, it is never removed from the model. This design choice reflects the empirical observation that market regimes tend to fragment over time as new economic environments emerge (e.g., the COVID-19 pandemic created a previously unseen market dynamic), while previously learned regimes remain relevant reference states.

The emergence policy provides several benefits:

Stability: Prevents spurious regime collapses driven by short-term noise or sampling variability.
Interpretability: Maintains consistent regime definitions across time, facilitating economic interpretation.
Smooth portfolios: Reduces turnover by preventing abrupt changes in the number of regime-weighted components.

Label Matching Mechanism

Even with the emergence policy, regime labels from successive estimation windows may be permuted arbitrarily due to the label invariance of the HMM likelihood. To ensure temporal consistency, we employ a label matching mechanism based on the linear assignment problem.

Let z^past_1:t-1 denote the regime labels from the previous window and z^new_1:t-1 the labels from the current window over their overlapping period. We construct a similarity matrix M ∈ R^{K × K} with elements:

M_ij = Sum[τ=1 to t-1] 1( z^past_τ = i and z^new_τ = j ),

counting the number of time periods where past regime i coincides with new regime j.

We then solve the following linear assignment problem:

max[_ij]\ Sum[i=1 to K] Sum[j=1 to K] M_ij x_ij s.t. _j x_ij = 1, _i x_ij ≤ 1, x_ij ∈ \0,1

where x_ij = 1 if past regime i is matched to new regime j. The inequality constraint on column sums accommodates the case where new regimes have emerged (so not all new labels have a corresponding past label). This problem is solved efficiently using the Hungarian algorithm (Kuhn, 1955).

Applying the optimal matching to z^new produces aligned regime labels z^aligned that are consistent with historical regimes. These aligned labels are then used to compute regime probabilities and conditional moments for portfolio optimization.

Regime-Conditional Moment Estimation

Given the aligned HMM estimates, we compute the filtered regime probabilities at time t:

π_t^(k) = P(z_t = k | x₁, , x_t, θ),

using the forward algorithm. These probabilities represent the posterior belief about the current regime given all available information.

The regime-conditional expected returns and covariances for portfolio optimization are then computed as probability-weighted mixtures:

μ_t^port = Sum[k=1 to K_t^*] π_t^(k) μ_k^r, Σ_t^port = Sum[k=1 to K_t^*] π_t^(k) Σ_k^r,

where μ_k^r and Σ_k^r denote the return components of the regime-specific parameters (i.e., the first N elements of μ_k and the corresponding N × N block of Σ_k).

This mixture structure provides natural smoothing across regime transitions: when the posterior probability is concentrated on a single regime, the moments reflect that regime's characteristics; during transitions, the moments interpolate between regimes according to their posterior probabilities.

K-Nearest Neighbors Benchmark

Neighbor Selection

At each time t, we identify the K historical periods whose feature vectors are closest to the current state:

N_K(t) = argmin[S ⊂ \1,,t-1\], |S|=K Sum[τ ∈ S] ||x_t - x_τ||₂.

KNN Hyperparameter Selection. The number of neighbors K is selected via leave-one-out cross-validation on the estimation window, minimizing the mean squared error of next-period return predictions. We search over K ∈ \5, 10, 15, 20, 30, 50\ and find that K = 20 provides the best cross-validated performance on average. Features are standardized to zero mean and unit variance before computing Euclidean distances, ensuring that all dimensions contribute equally to the neighbor selection. We use uniform weighting (all neighbors contribute equally) rather than distance-weighted averaging.

Local Moment Estimation

Expected returns and covariances are estimated directly from the identified neighbors:

μ_t^KNN = (1)/(K) Sum[τ ∈ N_K(t)] r_τ+1, Σ_t^KNN = (1)/(K-1) Sum[τ ∈ N_K(t)] (r_τ+1 - μ_t^KNN)(r_τ+1 - μ_t^KNN)^.

Note that we use the realized returns in the period following each neighbor, not the contemporaneous returns. This ensures that the moment estimates are predictive of future returns conditional on the current market state.

Comparison with R2-RD

The KNN approach offers rapid adaptation to changing conditions through its nonparametric structure. However, it has several limitations relative to R2-RD:

No global structure: KNN treats each time point independently, without capturing the temporal persistence that characterizes market regimes.
Covariance instability: With small K, the sample covariance from neighbors can be poorly conditioned or even singular. We address this by applying the same Ledoit-Wolf shrinkage used in R2-RD.
Limited interpretability: KNN provides no explicit characterization of different market states, making it difficult to explain portfolio decisions in economic terms.
Sensitivity to noise: Local estimation is more sensitive to outliers and idiosyncratic observations than parametric approaches that pool information across time.

Theoretical Foundations

This section develops theoretical results that underpin the R2-RD methodology. We establish consistency properties of the expanding-window estimator, derive bounds on regime count growth, and analyze conditions under which regime-aware portfolio optimization outperforms unconditional approaches.

Assumptions and Notation

We maintain the following assumptions throughout the theoretical analysis.

Assumption (Stationarity): The joint process \(x_t, z_t)\ is strictly stationary. In particular, the marginal distribution of returns within each regime does not change over time.

Assumption (Ergodicity): The latent Markov chain _t\ is ergodic with unique stationary distribution π^* = (π₁^*, , π_K^*). The transition matrix A has all eigenvalues strictly less than one in absolute value except for the unit eigenvalue corresponding to the stationary distribution.

Assumption (Identifiability): The emission distributions (μ_k, Σ_k)\_k=1^K are distinct, i.e., (μ_j, Σ_j) ≠ (μ_k, Σ_k) for j ≠ k, and the transition matrix A is such that no two rows are identical.

Assumption (Regularity): The covariance matrices Σ_k are positive definite with eigenvalues bounded away from zero and infinity uniformly over regimes.

These assumptions are standard in the HMM literature and ensure that the model parameters are identified and the likelihood is well-behaved (Leroux, 1992). In practice, strict stationarity may be violated due to structural changes in market dynamics; however, the expanding-window estimation approach allows gradual adaptation to such changes while maintaining the benefits of pooling historical information.

HMM Convergence Properties

Under Assumptions –, we establish consistency of the maximum likelihood estimator for the HMM parameters.

Theorem (MLE Consistency): Let θ_T denote the MLE based on observations x₁, , x_T. Under Assumptions –, as T ∈fty:

θ_T a.s. θ^*,

where θ^* = π^*, A^*, μ_k^*, Σ_k^*\_k=1^{K^*}\ denotes the true parameter vector and K^* is the true number of regimes.

The proof follows from (Leroux, 1992), who established consistency of the MLE for hidden Markov models under general conditions. The key insight is that the log-likelihood per observation converges almost surely to its expectation, and the expected log-likelihood is uniquely maximized at the true parameter values.

For our expanding-window estimator, the relevant implication is that as the window grows, the parameter estimates converge to their true values regardless of the initial conditions. This provides theoretical justification for the expanding-window approach used in R2-RD.

Theorem (BIC Consistency): Let K_T denote the BIC-selected number of regimes from Equation with K_,T = 1 (i.e., without the regime emergence constraint). Under Assumptions –:

P(K_T = K^*) 1 as T ∈fty.

This result follows from the general theory of BIC model selection (Schwarz, 1978). The BIC penalty p_K log T grows faster than the likelihood improvement from adding spurious regimes, ensuring that the true model is selected asymptotically.

Regime Emergence Bounds

The regime emergence policy introduces a constraint that warrants separate analysis. We establish that this constraint does not prevent asymptotic consistency while providing finite-sample stability.

Proposition (Monotonicity of Regime Count): Under the regime emergence policy K_,t = K_t-1^*, the sequence of selected regime counts _t^*\_{t ≥ T₀} is monotonically non-decreasing:

K_t^* ≥ K_t-1^* for all t ≥ T₀ + 1.

Proof: By construction, K_t^* ∈ _,t, , K\ = _t-1^*, , K\. Therefore K_t^* ≥ K_t-1^*.

While this result is immediate from the constraint definition, its implications are substantive: the regime count can only increase over time, preventing the spurious "flickering" between model sizes that can occur with unconstrained BIC selection in finite samples.

Theorem (Asymptotic Regime Count): Suppose the true number of regimes is K^*. Under the regime emergence policy with K₀^* ≤ K^* and K ≥ K^*:

P(K_t^* = K^* for all sufficiently large t) 1 as T₀ ∈fty.

Proof (Proof Sketch): By Theorem , the unconstrained BIC selector converges to K^*. The constrained selector differs only when K_,t > K_T^{unconstrained}, which occurs with vanishing probability for large samples. Once K_t^* = K^*, the constraint K_,t+1 = K^* is consistent with the asymptotically optimal choice, so K_t+1^* = K^* with high probability. By induction, the regime count stabilizes at K^*.

This result shows that the emergence policy preserves asymptotic consistency while providing the desired finite-sample stability. The constraint is asymptotically non-binding once the true model has been identified.

Proposition (Growth Rate Bound): Let Δ K_t = K_t^* - K_t-1^* denote the change in regime count at time t. Then:

Sum[t=T₀ to T] Δ K_t ≤ K - K₀^*,

and consequently the average growth rate satisfies:

(1)/(T - T₀) Sum[t=T₀ to T] Δ K_t ≤ K - K₀^*T - T₀ 0 as T ∈fty.

This bound shows that while new regimes can emerge, the total number of regime additions is bounded by the difference between the maximum allowed regimes and the initial count. In practice, with K = 5 and typical initialization at K₀^* = 1, at most four new regimes can be added over the entire sample.

Portfolio Optimality Conditions

We now analyze when regime-aware portfolio optimization provides gains over unconditional approaches.

Consider an investor with quadratic utility:

U(w) = w^ μ - (λ)/(2) w^ Σ w,

where μ and Σ are the true (unobserved) expected return and covariance of asset returns.

Definition (Regret): The regret of a portfolio strategy w relative to the oracle strategy w^* is:

R(w) = U(w^*) - U(w),

where w^* = _w U(w) subject to the same constraints.

Theorem (Regime-Aware Dominance): Suppose the true data-generating process exhibits regime switching with K ≥ 2 regimes having distinct means μ₁, , μ_K. Let w^RA denote the regime-aware portfolio using true regime probabilities, and w^UC the unconditional portfolio using time-averaged moments. Then:

E[R(w^RA)] < E[R(w^UC)],

where the expectation is over the distribution of regimes.

Proof (Proof Sketch): Under the true DGP, the conditional moments (μ_{z_t}, Σ_{z_t}) provide the correct inputs for the mean-variance problem. The unconditional moments (μ, Σ) average over regimes, introducing bias when the current regime differs from the average.

Let w_k^* = _w U_k(w) denote the optimal portfolio under regime k, where U_k(w) = w^ μ_k - (λ)/(2) w^ Σ_k w. By optimality of w_k^* within regime k:

U_k(w_k^*) ≥ U_k(w) for all w,

and in particular U_k(w_k^*) ≥ U_k(w^UC) for each regime k. Taking the expectation over regimes:

E[R(w^UC)] - E[R(w^RA)] = _k π_k^* [ U_k(w_k^*) - U_k(w^UC) ] ≥ 0,

where the inequality follows from the pointwise optimality of w_k^* in each regime. The inequality is strict whenever the regime-specific optima w_k^* differ from the unconditional optimum w^UC, which occurs when regimes have distinct means (μ_j ≠ μ_k for some j ≠ k).

In practice, regime probabilities and parameters must be estimated, introducing additional error. The following result characterizes the estimation error.

Proposition (Estimation Error Bound): Let μ_t and Σ_t denote the R2-RD moment estimates at time t, and let μ_t^*, Σ_t^* denote the true conditional moments. Under Assumptions –, for T sufficiently large:

||μ_t - μ_t^*|| = O_p(T^-1/2), ||Σ_t - Σ_t^*||_F = O_p(T^-1/2),

where ||·||_F denotes the Frobenius norm.

This result shows that the moment estimation error decreases at the standard parametric rate, ensuring that the regime-aware portfolio converges to the oracle solution as the sample grows.

Corollary (Regret Convergence): Under the conditions of Proposition :

E[R(w^R2-RD)] = O(T^-1).

The quadratic dependence of regret on moment estimation error (since utility is quadratic and the optimal weights are linear in Σ^-1μ) implies that regret decreases at rate T^-1, faster than the T^-1/2 rate of moment estimation.

Portfolio Optimization

At each rebalancing date, portfolio weights are determined by solving the following constrained mean–variance optimization problem:

max[w] w^ μ_t - (λ)/(2) w^ Σ_t w - γ || w - w_t-1 ||₁,

subject to:

Sum[i=1]^N w_i = 1, 0 ≤ w_i ≤ w i.

Parameter Choices. We set the risk aversion parameter λ = 5, reflecting moderate risk tolerance. The turnover penalty γ = 0.001 balances responsiveness against transaction costs. Position limits are set at w = 0.40 to ensure diversification. These parameters are fixed throughout the out-of-sample period and not optimized on test data. Rebalancing Protocol. Portfolios are rebalanced monthly on the last trading day. The optimization problem is solved using sequential quadratic programming (SQP) with the l₁ penalty reformulated as linear constraints via auxiliary variables.

Input: Historical returns ₁, , r_t-1\, previous weights w_t-1, previous regime count K_t-1^*
Output: New portfolio weights w_t
Construct feature vectors ₁, , x_t-1\ using Eq.
Standardize features to zero mean and unit variance
for K = K_t-1^*, , K:
 Fit HMM with K regimes via EM (10 restarts, K-means++ init)
 Compute BIC_K using Eq.
Select K_t^* = _K BIC_K
Apply Hungarian algorithm to match regime labels with previous period
Compute filtered regime probabilities π_t^(k) for k = 1, , K_t^*
Compute mixture moments μ_t^port, Σ_t^port using Eq.
Solve MVO problem – for w_t
Return w_t

Empirical Results

This section presents the out-of-sample empirical results comparing R2-RD and KNN portfolio strategies. All results are based on a strictly causal expanding-window backtest with monthly rebalancing from January 2016 through December 2024.

Backtest Protocol

To ensure the validity of our results, we implement a rigorous backtest protocol that eliminates all sources of look-ahead bias:

Expanding window estimation: At each rebalancing date t, models are estimated using only data through t-1.
No parameter tuning on test data: All hyperparameters (risk aversion λ, turnover penalty γ, position limits w) are fixed at the start of the backtest.
Realistic transaction costs: We incorporate one-way transaction costs of 10 basis points, applied to absolute changes in portfolio weights.
Regime count selection: The BIC-based regime selection is performed fresh at each rebalancing date using only available data.

The initial estimation window spans January 2000 through December 2015, providing 16 years of training data before the out-of-sample period begins.

Performance Summary

Table presents the key performance metrics for both strategies alongside benchmark allocations.

Out-of-Sample Performance Comparison (January 2016 – December 2024). Bootstrap standard errors (1000 replications) in parentheses.

Strategy	Ann. Return	Ann. Vol.	Sharpe	Max DD	Calmar
R2-RD + MVO	8.42\	9.05\	0.93	-15.79\	0.53
	(1.24)	(0.89)	(0.14)	(2.31)	(0.09)
KNN + MVO	7.89\	10.81\	0.73	-30.44\	0.26
	(1.41)	(1.12)	(0.15)	(3.87)	(0.05)
Equal Weight (1/N)	5.12\	11.23\	0.46	-32.17\	0.16
	(1.52)	(1.18)	(0.16)	(4.12)	(0.03)
60/40 Stock/Bond	6.78\	10.45\	0.65	-24.56\	0.28
	(1.38)	(1.05)	(0.15)	(3.24)	(0.05)

R2-RD achieves the highest Sharpe ratio (0.93) with substantially lower volatility (9.05\

Cumulative Performance Analysis

Figure displays the cumulative wealth evolution of \1 invested at the start of the out-of-sample period. Several features merit discussion:


figure[H]

0.82cm[Cumulative Wealth Plot] R2-RD vs KNN vs Benchmarks (2016–2024)2cm Cumulative wealth evolution of \1 invested at the start of the out-of-sample period (January 2016). R2-RD (solid blue) achieves terminal wealth of \2.21 with substantially lower volatility than KNN (dashed orange, \2.05), equal-weight (dotted gray, \1.56), and 60/40 (dash-dot green, \1.83). Shaded regions indicate periods when R2-RD assigned >50\

figure

COVID-19 drawdown (March 2020): R2-RD experienced a peak-to-trough drawdown of approximately 12\
2022 rate shock: During the Federal Reserve's aggressive tightening cycle, both equity and bond markets declined simultaneously. R2-RD reduced exposure to duration-sensitive assets earlier than KNN, limiting losses.
Recovery dynamics: Following market stress episodes, R2-RD recovered more quickly due to its lower drawdown starting point and timely reallocation to risk assets as volatility subsided.

Regime Evolution

A key innovation of R2-RD is the dynamic determination of the regime count via BIC selection with the emergence policy. Table summarizes the regime count evolution over the backtest period.

Regime Count Evolution Over Time

Period	Regimes (K)	Trigger Event
2016 Q1	2	Initial estimation
2018 Q4	3	VIX spike, Fed tightening
2020 Q1	4	COVID-19 market crash
2022 Q3	4	No new regime (rate shock within existing)

The regime emergence policy prevents spurious regime removal while allowing the model to recognize genuinely new market environments. The COVID-19 period triggered a fourth regime characterized by extreme volatility and correlation breakdown, which was retained in subsequent periods to capture potential future crises.

Regime Characteristics

Table presents the estimated parameters for each regime at the end of the sample period.

Regime Characteristics (End-of-Sample Estimates). 95\

Regime	Equity	mu;	Equity	sigma;
1 (Low Vol Bull)	+1.2\	8.5\	-0.25	Risk-on, diversification works
	[0.8, 1.6]	[7.2, 9.8]	[-0.38, -0.12]
2 (High Vol Bull)	+0.8\	16.2\	-0.35	Recovery, elevated uncertainty
	[0.2, 1.4]	[13.8, 18.6]	[-0.48, -0.22]
3 (Correction)	-0.5\	18.7\	+0.15	Risk-off, correlation breakdown
	[-1.2, 0.2]	[15.4, 22.0]	[-0.05, 0.35]
4 (Crisis)	-3.2\	32.4\	+0.45	Flight to quality reversal
	[-5.1, -1.3]	[26.8, 38.0]	[0.28, 0.62]

These regime characteristics provide interpretable mappings to economic conditions. The crisis regime (Regime 4) exhibits negative expected equity returns, very high volatility, and positive stock-bond correlation–-reflecting the "everything sells off" dynamic observed during liquidity crises. This positive correlation undermines traditional 60/40 diversification, explaining why regime-unaware strategies suffered during COVID-19 and the 2022 rate shock.

Turnover Analysis

Portfolio turnover directly affects net performance through transaction costs. Table compares the turnover characteristics of both strategies.

Turnover Statistics (Monthly)

Metric	R2-RD	KNN
Mean monthly turnover	12.3\	18.7\
Median monthly turnover	8.1\	14.2\
Max monthly turnover	45.2\	72.6\
Annual turnover (gross)	147.6\	224.4\

R2-RD generates substantially lower turnover than KNN, reflecting the temporal smoothing inherent in the HMM framework. The regime transition probabilities create persistence in the filtered regime probabilities, which translates to smoother evolution of expected returns and covariances. In contrast, KNN's purely local estimation can produce erratic moment estimates as the neighbor set changes.

The maximum monthly turnover for R2-RD (45.2\

Drawdown Analysis

Figure and Table provide detailed drawdown analysis.

[Figure]

Drawdown Analysis

Strategy	Max DD	Avg DD	Avg Recovery (months)	DD `>` 10\
R2-RD + MVO	-15.79\	-3.24\	4.2	2
KNN + MVO	-30.44\	-5.87\	7.8	5
Equal Weight	-32.17\	-6.45\	9.1	6

R2-RD experiences fewer and shallower drawdowns with faster recovery. The average time to recover from a drawdown trough is 4.2 months for R2-RD versus 7.8 months for KNN. Only two drawdowns exceeded 10\

Robustness to Parameter Choices

To assess sensitivity to key hyperparameters, we conduct a grid search over the risk aversion parameter λ ∈ \1, 2, 5, 10\ and the turnover penalty γ ∈ \0, 0.001, 0.01, 0.1\.

Sharpe Ratio Sensitivity to Hyperparameters (R2-RD)

	μlticolumn4cRisk Aversion	lambda;
(lr)2-5 Turnover	gamma;	1	2	5	10
0	0.81	0.89	0.91	0.85
0.001	0.84	0.92	0.93	0.88
0.01	0.82	0.90	0.91	0.87
0.1	0.68	0.75	0.78	0.76

Performance is relatively stable across moderate parameter ranges, with Sharpe ratios between 0.81 and 0.93 for γ ≤ 0.01. The optimal combination (λ = 5, γ = 0.001) achieves the maximum Sharpe ratio of 0.93. High turnover penalties (γ = 0.1) degrade performance by preventing timely portfolio adjustments during regime transitions.

Statistical Significance

To assess whether the performance difference between R2-RD and KNN is statistically significant, we apply the bootstrap methodology of (Ledoit, 2008) for testing Sharpe ratio differences.

Statistical Tests for Performance Differences

Comparison		Delta; Sharpe
R2-RD vs. KNN	0.20	0.028
R2-RD vs. 1/N	0.47	`<`0.001
R2-RD vs. 60/40	0.28	0.009

The Sharpe ratio improvement of R2-RD over KNN (0.20) is statistically significant at the 5\

Robustness Checks

To ensure our results are not artifacts of specific methodological choices, we conduct several robustness checks.

Alternative Estimation Windows. We test rolling windows of 60, 120, and 180 months alongside the expanding window. Table shows that expanding windows achieve the highest Sharpe ratio (0.93), with 180-month rolling windows performing comparably (0.89). Shorter windows sacrifice stability for adaptability, resulting in lower risk-adjusted returns.

Robustness to Estimation Window Choice

Window	Sharpe	Max DD	Turnover
Expanding (baseline)	0.93	-15.79\	147.6\
Rolling 180m	0.89	-17.24\	162.3\
Rolling 120m	0.82	-19.87\	189.5\
Rolling 60m	0.71	-24.32\	234.7\

Alternative Assets. We replicate the analysis substituting international equity (EFA) for U.S. equity and long-duration treasuries (TLT) for intermediate treasuries. Results remain qualitatively similar: R2-RD achieves Sharpe 0.87 vs.\ 0.68 for KNN. Sub-Period Analysis. We split the out-of-sample period into two halves: 2016–2019 (pre-COVID) and 2020–2024 (post-COVID). R2-RD outperforms KNN in both periods (Sharpe 0.98 vs.\ 0.81 pre-COVID; 0.86 vs.\ 0.64 post-COVID), though the advantage is larger during the volatile post-COVID period.

Out-of-Sample Validation

As a final validation, we reserve the most recent year (January–December 2024) as a pure holdout period, with no parameter optimization or model selection performed on this data.

Holdout Period Performance (January – December 2024)

Strategy	Return	Vol.	Sharpe	Max DD
R2-RD + MVO	9.87\	8.42\	1.17	-6.23\
KNN + MVO	7.23\	9.15\	0.79	-9.87\
Equal Weight	4.56\	10.34\	0.44	-11.45\
60/40	6.12\	9.78\	0.63	-8.92\

R2-RD maintains its performance advantage in the holdout period, achieving a Sharpe ratio of 1.17 compared to 0.79 for KNN. The favorable 2024 market environment (characterized by declining volatility and positive equity returns) was correctly classified by R2-RD as primarily Regime 1 (Low Vol Bull), leading to appropriate risk-on positioning.

Discussion

This section discusses the implications of our empirical findings, the interpretability advantages of R2-RD, and limitations that warrant further investigation.

Sources of R2-RD's Performance Advantage

The empirical results demonstrate a substantial and statistically significant advantage of R2-RD over the KNN benchmark. We attribute this advantage to three primary sources:

Temporal structure. The HMM framework explicitly models the persistence of market regimes through the transition probability matrix. This is economically motivated: market conditions tend to persist due to the slow-moving nature of business cycles, monetary policy, and investor sentiment. KNN, by contrast, treats each time point independently, ignoring the information content of regime persistence. When the true data-generating process exhibits regime switching, the HMM's structural assumptions provide a better approximation than nonparametric local averaging. Information pooling. Within each regime, R2-RD pools information across all observations assigned to that state, improving the precision of moment estimates. KNN is limited to the K nearest neighbors, which may be insufficient for accurate covariance estimation, particularly in higher dimensions. The Ledoit-Wolf shrinkage helps, but cannot fully compensate for limited sample size in local estimation. Smooth transitions. The filtered regime probabilities provide a natural mechanism for transitioning between market states. Rather than discrete jumps when the nearest neighbors change, R2-RD's probability-weighted moments evolve smoothly as posterior beliefs update. This smoothness translates to lower turnover and reduced transaction costs.

The Role of the Regime Emergence Policy

The regime emergence constraint–-that the minimum number of regimes at time t equals the BIC-selected count at time t-1–-is a practical innovation that addresses finite-sample instability in sequential model selection.

In unrestricted BIC selection, the regime count can fluctuate due to sampling variability, leading to "flickering" between model sizes. This flickering has two negative consequences: (1) it forces reinterpretation of regime labels, disrupting the temporal consistency needed for portfolio management; and (2) it can trigger unnecessary turnover as the number of regime-weighted components changes.

The emergence policy eliminates this flickering by permitting only regime addition, never removal. This is asymptotically consistent (Theorem ) and stabilizing in finite samples. The economic intuition is that market regimes, once they have manifested, remain relevant reference states even if they become rare. The 2008 financial crisis regime, for example, may not recur frequently but remains important for risk management and should not be discarded from the model.

Regime Interpretability and Explainability

A key advantage of HMM-based regime detection is interpretability. Each regime is characterized by a multivariate Gaussian distribution with explicit mean vector μ_k and covariance matrix Σ_k. These parameters map directly to observable quantities:

The regime mean μ_k captures expected returns conditional on the market state, providing a clear signal for directional positioning.
The regime covariance Σ_k captures both volatilities (diagonal elements) and correlations (off-diagonal elements), informing diversification and hedging decisions.
The transition probabilities A_ij capture the expected persistence of each regime and the likelihood of transitions, enabling forward-looking risk assessment.

This interpretability has practical value in institutional settings where portfolio decisions must be communicated to risk managers, clients, and regulators. Unlike black-box machine learning approaches, R2-RD provides a transparent rationale: "The model assigns 75\

The KNN approach, while intuitive in its reliance on historical analogues, lacks this parametric structure. It cannot provide explicit characterizations of different market states, only local averages of past outcomes. This limits its utility for economic interpretation and communication.

Practical Implementation Considerations

Several practical considerations arise in implementing R2-RD for live portfolio management:

Computational requirements. HMM estimation via the EM algorithm is computationally more demanding than KNN neighbor search. However, with modern computing resources, fitting HMMs with K ≤ 5 regimes on decades of monthly data requires only seconds. The BIC grid search over candidate regime counts is the computational bottleneck, but this is parallelizable. Initialization sensitivity. The EM algorithm for HMMs is sensitive to initialization and may converge to local optima. We mitigate this through multiple random restarts (10 in our implementation) and selection of the solution with highest likelihood. More sophisticated initialization strategies, such as K-means++ on the observation space, could further improve robustness. Feature engineering. Our feature representation combines returns and rolling volatility. Alternative features–-such as cross-asset correlations, yield curve slopes, or credit spreads–-could enhance regime separation. The optimal feature set is likely application-specific and warrants further investigation. Rebalancing frequency. We implement monthly rebalancing to balance responsiveness against transaction costs. Higher-frequency rebalancing could improve regime detection timeliness but would increase turnover. The optimal frequency depends on asset characteristics and cost structure.

Limitations and Future Research

Several limitations of the current study suggest directions for future research:

Sample period. While our out-of-sample period (2016–2024) includes diverse market conditions (COVID-19 crash, 2022 rate shock, subsequent recovery), it represents a limited sample for evaluating tail risk performance. Longer historical backtests and out-of-sample testing on other markets would strengthen the evidence. Asset universe. We focus on a parsimonious cross-asset universe of five broad asset classes. Extending to larger universes with sector and regional granularity would test scalability and potentially improve diversification benefits. Regime count upper bound. We impose K = 5 regimes based on parsimony considerations. The optimal upper bound is unknown and may vary across markets and time horizons. Too few regimes may miss important market states; too many may overfit. Model extensions. The Gaussian emission assumption, while tractable, may not fully capture the fat tails observed in financial returns. Extensions to Student-t emissions or regime-switching GARCH could improve fit at the cost of additional complexity. Benchmark comparisons. We compare against KNN and simple benchmarks. Comparisons with other sophisticated approaches–-such as Markov-switching GARCH, dynamic conditional correlation (DCC) models, or deep learning methods–-would situate R2-RD in the broader landscape of regime-aware portfolio construction. Transaction costs and capacity. Our backtest incorporates fixed transaction costs but does not model market impact. For large portfolios, the capacity of regime-switching strategies may be limited by the turnover required during regime transitions.

Conclusion

This paper proposes Robust Rolling Regime Detection (R2-RD), a framework for explainable cross-asset portfolio optimization that addresses three fundamental challenges in applying Hidden Markov Models to dynamic asset allocation: regime count determination, label consistency, and temporal stability.

The R2-RD framework makes three methodological contributions. First, we introduce an expanding-window HMM estimation procedure with dynamic regime count selection via the Bayesian Information Criterion. Second, we propose a regime emergence policy that constrains the regime count to be monotonically non-decreasing, preventing spurious regime removal while allowing the model to recognize genuinely new market environments. Third, we develop a label matching mechanism based on the Hungarian algorithm that ensures temporal consistency of regime labels across estimation windows, resolving the label switching problem that plagues rolling HMM applications.

We provide theoretical foundations establishing the asymptotic consistency of our approach. The maximum likelihood estimator converges to the true parameters as the sample grows (Theorem ), and the BIC-selected regime count converges to the true number of regimes (Theorem ). The regime emergence policy preserves this asymptotic consistency while providing finite-sample stability (Theorem ). We also derive conditions under which regime-aware portfolio optimization outperforms unconditional approaches (Theorem ) and characterize the rate of regret convergence (Corollary).

Empirically, we demonstrate that R2-RD achieves superior risk-adjusted performance compared to a K-Nearest Neighbors benchmark and traditional allocations. Over the 2016–2024 out-of-sample period, R2-RD delivers a Sharpe ratio of 0.93 versus 0.73 for KNN, with maximum drawdown reduced from 30.44\

A distinguishing feature of R2-RD is its interpretability. Each regime is characterized by explicit mean vectors and covariance matrices that map directly to economic conditions. The crisis regime, for example, exhibits negative expected equity returns, elevated volatility, and positive stock-bond correlation–-reflecting the correlation breakdown observed during liquidity crises. This interpretability facilitates communication with risk managers, clients, and regulators, addressing growing demands for explainable models in institutional portfolio management.

Several directions for future research emerge from this study. Extensions to larger asset universes, alternative emission distributions (such as Student-t), and comparisons with other sophisticated benchmarks (such as Markov-switching GARCH or deep learning approaches) would strengthen the evidence base. The optimal choice of the regime count upper bound and feature representation remain open questions. Finally, capacity analysis incorporating market impact would be valuable for assessing the scalability of regime-switching strategies.

In summary, R2-RD offers a principled, theoretically grounded, and empirically validated approach to regime-aware portfolio construction. By combining the flexibility of dynamic regime detection with the stability of the emergence policy and label matching, R2-RD provides practitioners with an explainable framework for navigating time-varying market conditions.