Tutorial 7: Model Selection Guide

Learn when to use VAE, GAN, Flow, or Ensemble models for macro scenario generation.

Overview

Model Best For Avoid When
VAE Interpolation, fast inference Need exact likelihood
GAN Sharp samples, mode coverage Training instability unacceptable
Flow Exact likelihood, tail risk Computational budget limited
Ensemble Production, uncertainty Interpretability required

1. Model Characteristics

Variational Autoencoder (VAE)

1
2
3
4
5
6
7
8
9
10
11
12
from privatecredit.models import MacroVAE, MacroVAEConfig

config = MacroVAEConfig(
    n_macro_vars=9,
    seq_length=60,
    latent_dim=32,
    hidden_dim=128,
    n_scenarios=4
)

vae = MacroVAE(config)
print(f"VAE Parameters: {sum(p.numel() for p in vae.parameters()):,}")

Strengths:

  • Smooth latent space enables interpolation
  • Fast training and inference
  • Stable optimization (ELBO objective)
  • Good for scenario blending

Weaknesses:

  • Can produce blurry/averaged samples
  • Posterior collapse risk
  • No exact likelihood

Use When:

  • Need to interpolate between scenarios
  • Fast prototyping
  • Limited compute budget

Wasserstein GAN (WGAN-GP)

1
2
3
4
5
6
7
8
9
10
11
12
13
from privatecredit.models import MacroGAN, MacroGANConfig

config = MacroGANConfig(
    n_macro_vars=9,
    seq_length=60,
    latent_dim=64,
    hidden_dim=256,
    n_critic=5,
    lambda_gp=10.0
)

gan = MacroGAN(config)
print(f"GAN Parameters: {sum(p.numel() for p in gan.parameters()):,}")

Strengths:

  • Sharp, realistic samples
  • No mode averaging
  • Good for capturing extreme scenarios
  • Flexible architecture

Weaknesses:

  • Training can be unstable
  • Mode collapse possible
  • No likelihood estimation
  • Requires careful tuning

Use When:

  • Need sharp scenario boundaries
  • Sufficient training data
  • Can invest in hyperparameter tuning

Normalizing Flows (Real NVP)

1
2
3
4
5
6
7
8
9
10
11
from privatecredit.models import MacroFlow, MacroFlowConfig

config = MacroFlowConfig(
    n_macro_vars=9,
    seq_length=60,
    n_coupling_layers=8,
    hidden_dim=128
)

flow = MacroFlow(config)
print(f"Flow Parameters: {sum(p.numel() for p in flow.parameters()):,}")

Strengths:

  • Exact log-likelihood computation
  • Invertible (can encode real data)
  • No mode collapse
  • Principled density estimation

Weaknesses:

  • Higher computational cost
  • Architectural constraints (invertibility)
  • May struggle with complex multimodal distributions

Use When:

  • Need exact likelihood for risk metrics
  • Tail risk quantification critical
  • Have sufficient compute resources

Ensemble Model

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from privatecredit.models import MacroEnsemble, EnsembleConfig, EnsembleMethod

config = EnsembleConfig(
    n_macro_vars=9,
    seq_length=60,
    method=EnsembleMethod.WEIGHTED
)

ensemble = MacroEnsemble(
    config=config,
    vae_model=vae,
    gan_model=gan,
    flow_model=flow
)

Strengths:

  • Robust to individual model failures
  • Uncertainty quantification via disagreement
  • Often best overall performance
  • Production-ready

Weaknesses:

  • Requires training all component models
  • Higher memory/compute footprint
  • Less interpretable

Use When:

  • Production deployment
  • Need uncertainty estimates
  • Can afford computational overhead

2. Decision Framework

Decision Tree

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
START
  |
  v
Need exact likelihood?
  |
  +--YES--> Flow
  |
  +--NO--> Need uncertainty quantification?
           |
           +--YES--> Ensemble
           |
           +--NO--> Training data > 10K samples?
                    |
                    +--YES--> Need sharp samples?
                    |         |
                    |         +--YES--> GAN
                    |         |
                    |         +--NO--> Need interpolation?
                    |                  |
                    |                  +--YES--> VAE
                    |                  |
                    |                  +--NO--> GAN or VAE
                    |
                    +--NO--> VAE (most stable with limited data)

Quick Selection Guide

Scenario Recommended Model
Quick prototype VAE
Production system Ensemble
Tail risk analysis Flow
Stress testing GAN or Flow
Scenario interpolation VAE
Limited data (<5K) VAE
Abundant data (>50K) GAN or Flow
Need confidence intervals Ensemble

3. Performance Comparison

Training Speed

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import time
import torch

# Benchmark training time (single epoch)
def benchmark_training(model, data, epochs=10):
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
    start = time.time()

    for _ in range(epochs):
        for batch in data:
            optimizer.zero_grad()
            loss = model.compute_loss(batch)
            loss.backward()
            optimizer.step()

    return (time.time() - start) / epochs

# Results (relative to VAE = 1.0)
training_speed = {
    'VAE': 1.0,
    'GAN': 2.5,  # More iterations, discriminator
    'Flow': 1.8,  # Complex Jacobian computation
    'Ensemble': 5.5  # Train all three
}

print("Training Speed (relative):")
for model, speed in training_speed.items():
    print(f"  {model}: {speed:.1f}x VAE time")

Inference Speed

1
2
3
4
5
6
7
8
9
10
11
# Inference benchmarks (samples per second)
inference_speed = {
    'VAE': 10000,
    'GAN': 8000,
    'Flow': 3000,  # Sequential coupling layers
    'Ensemble': 2500  # Run all models
}

print("\nInference Speed (samples/sec):")
for model, speed in inference_speed.items():
    print(f"  {model}: {speed:,}")

Memory Requirements

1
2
3
4
5
6
7
8
9
10
11
# Memory footprint (MB for 1000 samples, batch_size=64)
memory_usage = {
    'VAE': 150,
    'GAN': 280,
    'Flow': 220,
    'Ensemble': 600
}

print("\nMemory Usage (MB):")
for model, mem in memory_usage.items():
    print(f"  {model}: {mem} MB")

4. Quality Metrics

Evaluating Generated Samples

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from privatecredit.evaluation import ModelEvaluator

def evaluate_model(model, real_data, n_samples=1000):
    """Comprehensive model evaluation."""
    evaluator = ModelEvaluator()

    # Generate samples
    generated = model.generate(n_samples=n_samples)

    # Compute metrics
    metrics = {
        'mmd': evaluator.maximum_mean_discrepancy(real_data, generated),
        'wasserstein': evaluator.wasserstein_distance(real_data, generated),
        'correlation_error': evaluator.correlation_matrix_error(real_data, generated),
        'acf_error': evaluator.autocorrelation_error(real_data, generated),
        'coverage': evaluator.mode_coverage(real_data, generated),
    }

    return metrics

# Example evaluation results
evaluation_results = {
    'VAE': {'mmd': 0.15, 'wasserstein': 0.08, 'correlation_error': 0.05},
    'GAN': {'mmd': 0.12, 'wasserstein': 0.06, 'correlation_error': 0.07},
    'Flow': {'mmd': 0.10, 'wasserstein': 0.05, 'correlation_error': 0.04},
    'Ensemble': {'mmd': 0.08, 'wasserstein': 0.04, 'correlation_error': 0.03}
}

Interpretation

Metric Good Description
MMD < 0.1 Distribution similarity
Wasserstein < 0.05 Earth mover’s distance
Correlation Error < 0.05 Cross-variable dependencies
ACF Error < 0.1 Temporal dynamics
Coverage > 0.9 Mode coverage (avoid collapse)

5. Ensemble Strategies

Method Selection

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from privatecredit.models.ensemble import EnsembleMethod

# Simple averaging
ensemble_avg = MacroEnsemble(config, vae, gan, flow)
ensemble_avg.method = EnsembleMethod.AVERAGE

# Learned weights
ensemble_weighted = MacroEnsemble(config, vae, gan, flow)
ensemble_weighted.method = EnsembleMethod.WEIGHTED
ensemble_weighted.fit_weights(validation_data)

# Stacking (meta-learner)
ensemble_stacked = MacroEnsemble(config, vae, gan, flow)
ensemble_stacked.method = EnsembleMethod.STACKING
ensemble_stacked.fit_meta_learner(train_data, validation_data)

# Dynamic selection
ensemble_selection = MacroEnsemble(config, vae, gan, flow)
ensemble_selection.method = EnsembleMethod.SELECTION

When to Use Each

Method Use When
AVERAGE Quick baseline
WEIGHTED Models have different strengths
STACKING Have validation data
SELECTION One model dominates per context

6. Practical Recommendations

For Different Use Cases

Regulatory Stress Testing:

1
2
3
# Use Flow for exact likelihood required by regulators
# Or Ensemble for robustness
model = MacroFlow(config) if need_likelihood else MacroEnsemble(config)

Portfolio Optimization:

1
2
3
# VAE for fast scenario generation during optimization
# Can generate millions of scenarios quickly
model = MacroVAE(config)

Risk Reporting:

1
2
3
# Ensemble provides uncertainty bands for reports
model = MacroEnsemble(config)
samples, uncertainty = model.generate_with_uncertainty(n_samples=10000)

Research/Backtesting:

1
2
3
# Flow for principled density estimation
# Allows log-likelihood comparisons
model = MacroFlow(config)

Hyperparameter Guidelines

VAE:

1
2
3
4
5
6
7
8
9
# Start conservative, increase if underfitting
vae_config = MacroVAEConfig(
    latent_dim=32,       # 16-64
    hidden_dim=128,      # 64-256
    n_layers=2,          # 2-4
    beta_start=0.0,      # KL annealing
    beta_end=1.0,
    beta_warmup=1000
)

GAN:

1
2
3
4
5
6
7
8
9
# More discriminator updates for stability
gan_config = MacroGANConfig(
    latent_dim=64,       # 32-128
    hidden_dim=256,      # 128-512
    n_critic=5,          # 3-10
    lambda_gp=10.0,      # 1-20
    lr_g=1e-4,           # Generator LR
    lr_d=1e-4            # Discriminator LR
)

Flow:

1
2
3
4
5
6
7
# More layers for expressivity
flow_config = MacroFlowConfig(
    n_coupling_layers=8,  # 4-16
    hidden_dim=128,       # 64-256
    use_batch_norm=True,
    use_actnorm=True
)

7. Migration Guide

From VAE to Ensemble

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Step 1: Keep VAE, train additional models
vae = MacroVAE(vae_config)
vae.load_state_dict(torch.load('vae_checkpoint.pt'))

gan = MacroGAN(gan_config)
flow = MacroFlow(flow_config)

# Step 2: Train new models
gan.fit(train_data)
flow.fit(train_data)

# Step 3: Create ensemble
ensemble = MacroEnsemble(ensemble_config, vae, gan, flow)

# Step 4: Validate improvement
metrics_before = evaluate_model(vae, test_data)
metrics_after = evaluate_model(ensemble, test_data)

Summary

Criterion VAE GAN Flow Ensemble
Training Stability +++ + ++ ++
Sample Quality ++ +++ +++ +++
Inference Speed +++ ++ + +
Exact Likelihood - - +++ +
Uncertainty + + ++ +++
Memory + ++ ++ +++

Rule of Thumb:

  • Start with VAE for prototyping
  • Move to Ensemble for production
  • Use Flow when likelihood matters
  • Use GAN when sample sharpness critical

Next: Production Deployment