Methodology
Detailed description of the modeling approach.
Problem Formulation
Given a portfolio of loans with features X, we aim to generate:
- Macro paths
M_{1:T}: Correlated macroeconomic time series - State sequences
S_{1:T}: Loan state trajectories - Continuous values
V_{1:T}: Payments, balances, losses - Portfolio outcomes
L: Loss distribution, tranche returns
Hierarchical Generative Model
Level 1: Macro Scenario Generation
We use a Conditional Variational Autoencoder (CVAE) to generate macro paths.
Encoder:
1
q_phi(z | M, s) = N(mu_phi(M, s), sigma_phi(M, s))
Decoder:
1
p_theta(M | z, s) = Prod_t N(m_t | g_theta(z, s, m_{<t}))
Loss:
1
L_macro = E_q[||M - M_hat||^2] + beta * KL(q_phi || p(z))
The scenario label s enables conditional generation under different economic regimes.
Level 2: Cohort Transitions
A Transformer encoder predicts time-varying transition matrices:
1
P(t) = f_psi(cohort_features, M_{1:t})
Cross-attention allows the macro path to modulate transition dynamics:
1
Attention(Q_cohort, K_macro, V_macro)
Loss:
1
L_trans = -sum_{t,i,j} n_ij(t) * log(P_ij(t))
Level 3: Loan Trajectories
An autoregressive transformer generates individual loan paths:
State Generation:
1
s_t ~ Cat(softmax(h_t * W_s + P_cohort(s_{t-1}, :)))
Continuous Values (Diffusion):
1
v_t = Denoise(epsilon_t, s_t, x_loan)
The diffusion head captures complex distributions of payments and recoveries.
Level 4: Portfolio Aggregation
Cashflows are aggregated and distributed via waterfall rules:
1
2
3
Collections = sum_i (payment_i)
Losses = sum_i (loss_i)
Tranche_CF = Waterfall(Collections, Losses, Rules)
Differentiable via soft approximations:
1
gate(x, threshold) = sigmoid((x - threshold) / temperature)
Correlation Structure
Correlation is induced at multiple levels:
| Source | Mechanism | Magnitude |
|---|---|---|
| Macro | Shared macro path | ~60% of total |
| Cohort | Vintage/asset class grouping | ~25% of total |
| Factor | Latent industry/geography | ~10% of total |
| Idiosyncratic | Diffusion noise | ~5% of total |
The hierarchical structure naturally captures:
- Systematic risk: All loans exposed to same macro
- Concentrated risk: Cohort-level clustering
- Diversification: Residual idiosyncratic variation
Training Strategy
Stage 1: Component Pre-training
Train each component separately:
| Component | Objective | Data |
|---|---|---|
| Macro VAE | Reconstruction + KL | Historical macro series |
| Transition Transformer | Cross-entropy on transitions | Cohort transition counts |
| Loan Trajectory | State + diffusion loss | Loan-month panel |
Stage 2: End-to-End Fine-tuning
Joint training with portfolio objectives:
1
L_total = L_macro + L_trans + L_traj + lambda * L_portfolio
Where L_portfolio matches:
- Historical loss rates
- Tranche return distributions
- Tail risk measures
Stage 3: Calibration
Final calibration to match:
- Observed default rates by cohort
- Historical macro correlations
- Recovery rate distributions
Scenario Conditioning
Standard Scenarios
| Scenario | GDP Shift | Unemp. Shift | Spread Mult. |
|---|---|---|---|
| Baseline | 0% | 0% | 1.0x |
| Adverse | -3% | +3% | 2.0x |
| Severely Adverse | -6% | +8% | 4.0x |
| Stagflation | -2% | +3% | 2.5x |
Custom Conditioning
Condition on specific outcomes:
1
2
3
4
model.generate_conditional({
'gdp_growth_yoy': {'month': 12, 'value': -0.04},
'unemployment_rate': {'month': 24, 'value': 0.10}
})
Evaluation Metrics
Generation Quality
| Metric | Target |
|---|---|
| Macro reconstruction RMSE | < 0.5% |
| Transition accuracy | > 85% |
| State sequence accuracy | > 80% |
| Payment RMSE | < $100 |
Portfolio Metrics
| Metric | Validation |
|---|---|
| Expected Loss | Within 10% of historical |
| VaR 99% | Conservative vs historical |
| Scenario ordering | Severe > Adverse > Baseline |
| Tranche attachment | Consistent with ratings |
Computational Considerations
Memory
| Component | Memory (10k loans, 60 months) |
|---|---|
| Macro VAE | ~100 MB |
| Transition Transformer | ~500 MB |
| Loan Trajectory | ~2 GB |
| Full simulation (10k sims) | ~8 GB |
Runtime
| Operation | Time (GPU) |
|---|---|
| Train Macro VAE (200 epochs) | ~10 min |
| Train Transitions (100 epochs) | ~30 min |
| Train Trajectories (50 epochs) | ~2 hours |
| Monte Carlo (10k sims) | ~5 min |
Limitations
- Data requirements: Needs historical loan-level data
- Stationarity: Assumes stable regime (may need retraining)
- Tail estimation: Limited by simulation sample size
- Model risk: Deep learning opacity
Future Directions
- Continuous-time models: Replace discrete monthly steps
- Graph neural networks: Explicit loan relationship modeling
- Online learning: Adapt to new data without full retraining
- Explainability: Attribution of losses to factors