Fama-French Data Pipeline

Academic Primer: "Implied Risk Premia for Factors: Theory, Estimation, and Applications"

Author: Joerg Osterrieder

Last Updated: January 2026

Pipeline Purpose

This pipeline downloads real Fama-French factor data from Kenneth French's Data Library, computes all statistics needed for the academic primer's tables and figures, and generates publication-quality visualizations.

Data Flow Diagram

+---------------------------+
|   Kenneth French Data     |
|        Library            |
|   (mba.tuck.dartmouth)    |
+------------+--------------+
             |
             v
+---------------------------+
|   pandas_datareader       |
|   load_ff_factors()       |
|   load_25_portfolios()    |
+------------+--------------+
             |
             v
+---------------------------+
|   Raw Factor Returns      |
|   - Mkt-RF, SMB, HML      |
|   - RMW, CMA, Mom, RF     |
|   - 25 Size/BM Portfolios |
|   T = 726 months          |
+------------+--------------+
             |
      +------+------+
      |             |
      v             v
+-------------+  +---------------+
| Statistics  |  | Chart Scripts |
| Generator   |  | (6 charts)    |
| (JSON)      |  | chart.py      |
+------+------+  +-------+-------+
       |                 |
       v                 v
+-------------+  +---------------+
| LaTeX       |  | PDF Figures   |
| Tables      |  | (chart.pdf)   |
| (9 tables)  |  | (6 figures)   |
+-------------+  +---------------+

Data Source

Kenneth French Data Library

URL https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html

Datasets F-F_Research_Data_5_Factors_2x3, F-F_Momentum_Factor, 25_Portfolios_5x5

Period July 1963 - December 2023 (T = 726 months)

Access Date: January 2026

Why This Data Source?

  • Academic Standard: The Fama-French factors are the benchmark in empirical asset pricing research
  • Free Access: Data freely available for academic and commercial use
  • Long History: 60+ years of monthly data enables robust statistical inference
  • Consistency: Methodology documented and updated by Ken French

Data License

The data is provided by Kenneth R. French for academic research. When using this data, cite:

Fama, E.F. and French, K.R. (1993). "Common risk factors in the returns on stocks and bonds." Journal of Financial Economics, 33(1), 3-56.

Input Datasets

Dataset 1: F-F_Research_Data_5_Factors_2x3

VariableDescriptionConstruction
Mkt-RFMarket excess returnValue-weighted return on all NYSE/AMEX/NASDAQ stocks minus 1-month T-bill rate
SMBSmall Minus BigReturn on small-cap minus large-cap portfolios (size breakpoint: NYSE median)
HMLHigh Minus LowReturn on high B/M minus low B/M portfolios (B/M breakpoints: NYSE 30/70)
RMWRobust Minus WeakReturn on high OP minus low OP portfolios (operating profitability)
CMAConservative Minus AggressiveReturn on low investment minus high investment portfolios
RFRisk-free rate1-month U.S. Treasury bill rate

Dataset 2: F-F_Momentum_Factor

VariableDescriptionConstruction
MomUp Minus Down (UMD)Return on high prior return (winners) minus low prior return (losers) portfolios. Prior returns: months t-12 to t-2.

Dataset 3: 25_Portfolios_5x5

25 portfolios formed on the intersection of 5 size quintiles and 5 book-to-market quintiles, using NYSE breakpoints. These portfolios serve as test assets for cross-sectional regressions.

Variable Definitions

Factor Return Notation

Let \(f_{k,t}\) denote the return on factor \(k\) in month \(t\). For \(T\) months of data:

\[\bar{f}_k = \frac{1}{T}\sum_{t=1}^{T} f_{k,t} \quad \text{(sample mean)}\] \[\hat{\sigma}_k = \sqrt{\frac{1}{T-1}\sum_{t=1}^{T}(f_{k,t} - \bar{f}_k)^2} \quad \text{(sample std dev)}\] \[t_k = \frac{\bar{f}_k}{\hat{\sigma}_k / \sqrt{T}} \quad \text{(t-statistic)}\] \[\text{SR}_k = \frac{\bar{f}_k \times 12}{\hat{\sigma}_k \times \sqrt{12}} \quad \text{(annualized Sharpe ratio)}\]

Annualization

  • Mean: Multiply monthly mean by 12
  • Volatility: Multiply monthly std dev by \(\sqrt{12}\)
  • Sharpe Ratio: Annualized mean divided by annualized volatility

1. Data Loader Module

File: figures/_shared/data_loader.py

Purpose

Central module for downloading, caching, and processing Fama-French factor data. Provides fallback data generation if pandas_datareader is unavailable.

Core Functions

load_ff_factors(start, end, use_cache)
def load_ff_factors(start='1963-07', end='2023-12', use_cache=True):
    """
    Load Fama-French 5 factors + momentum from Kenneth French library.

    Parameters
    ----------
    start : str
        Start date in YYYY-MM format (default: '1963-07')
    end : str
        End date in YYYY-MM format (default: '2023-12')
    use_cache : bool
        Whether to use cached data if available

    Returns
    -------
    pd.DataFrame
        DataFrame with columns: Mkt-RF, SMB, HML, RMW, CMA, Mom, RF
        Index: DatetimeIndex (monthly)
        Values: decimal returns (not percentages)
    """
    cache_file = CACHE_DIR / "ff_factors.parquet"

    if use_cache and cache_file.exists():
        factors = pd.read_parquet(cache_file)
        factors = factors.loc[start:end]
        return factors

    try:
        import pandas_datareader.data as web

        # Download FF5 factors
        ff5 = web.DataReader('F-F_Research_Data_5_Factors_2x3',
                             'famafrench', start=start)[0]

        # Download Momentum factor
        mom = web.DataReader('F-F_Momentum_Factor',
                             'famafrench', start=start)[0]

        # Combine and convert to decimals
        factors = ff5.join(mom)
        factors = factors / 100  # Convert percentages to decimals

        # Standardize column names
        factors.columns = ['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA', 'RF', 'Mom']

        # Cache the data
        CACHE_DIR.mkdir(exist_ok=True)
        factors.to_parquet(cache_file)

        return factors.loc[start:end]

    except ImportError:
        print("pandas_datareader not installed. Using fallback data.")
        return _load_fallback_factors(start, end)
load_25_portfolios(start, end, use_cache)
def load_25_portfolios(start='1963-07', end='2023-12', use_cache=True):
    """
    Load 25 Size/Book-to-Market portfolios for cross-sectional tests.

    Returns
    -------
    pd.DataFrame
        DataFrame with 25 portfolio returns
        Index: DatetimeIndex (monthly)
        Values: decimal returns
    """
    cache_file = CACHE_DIR / "portfolios_25.parquet"

    if use_cache and cache_file.exists():
        portfolios = pd.read_parquet(cache_file)
        return portfolios.loc[start:end]

    try:
        import pandas_datareader.data as web

        portfolios = web.DataReader('25_Portfolios_5x5',
                                    'famafrench', start=start)[0]
        portfolios = portfolios / 100  # Convert to decimals

        # Cache
        CACHE_DIR.mkdir(exist_ok=True)
        portfolios.to_parquet(cache_file)

        return portfolios.loc[start:end]

    except ImportError:
        return _load_fallback_portfolios(start, end)
compute_factor_statistics(factors, annualize)
def compute_factor_statistics(factors, annualize=True):
    """
    Compute comprehensive factor statistics.

    Returns
    -------
    pd.DataFrame
        Statistics: Mean, Std, t-stat, Sharpe, Skew, Kurt, Min, Max
    """
    factor_cols = ['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA', 'Mom']
    factors = factors[factor_cols]
    T = len(factors)

    stats = pd.DataFrame(index=factor_cols)
    stats['Mean'] = factors.mean()
    stats['Std'] = factors.std()
    stats['t-stat'] = stats['Mean'] / (stats['Std'] / np.sqrt(T))
    stats['Sharpe'] = stats['Mean'] / stats['Std']
    stats['Skew'] = factors.skew()
    stats['Kurt'] = factors.kurtosis()
    stats['Min'] = factors.min()
    stats['Max'] = factors.max()

    if annualize:
        stats['Mean'] = stats['Mean'] * 12
        stats['Std'] = stats['Std'] * np.sqrt(12)
        stats['Sharpe'] = stats['Mean'] / stats['Std']

    return stats

2. Statistics Generator

File: figures/generate_all_data.py

Purpose

Master script that computes ALL statistics needed for paper tables. Outputs JSON for charts and prints LaTeX-formatted tables.

Computation Steps

  1. Load Fama-French factors (726 months)
  2. Load 25 Size/BM portfolios (25 test assets)
  3. Compute historical factor statistics (Table 4, C.19)
  4. Compute correlation matrix (Table C.20)
  5. Run Fama-MacBeth regressions (Table 9)
  6. Compute implied premia via reverse optimization (Table 11)
  7. Compute subsample premia (Table 12)
  8. Run GRS model comparison tests (Table 14)
  9. Run factor timing backtest (Table 15)
  10. Save all results to JSON

Run Command

cd academic-primer-framework/figures
python generate_all_data.py

Output

======================================================================
COMPUTING ALL STATISTICS FROM FAMA-FRENCH DATA
======================================================================

Data Source: https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
Period: July 1963 - December 2023

Loading Fama-French factors...
  Loaded 726 months (1963-07 to 2023-12)
Loading 25 Size/BM portfolios...
  Loaded 726 months, 25 portfolios

----------------------------------------------------------------------
TABLE 4 / TABLE C.19: Historical Factor Premia
----------------------------------------------------------------------
[LaTeX table output...]

----------------------------------------------------------------------
TABLE C.20: Factor Correlation Matrix
----------------------------------------------------------------------
[LaTeX table output...]

...

======================================================================
Saving results to figures/_shared/computed_data.json
======================================================================
Results saved to: figures\_shared\computed_data.json

3. Chart Generation

Each figure has its own folder with a standalone chart.py script.

FigureFolderDescription
Figure 101_factor_premia_history/Historical factor premia bar chart with 95% CI
Figure 202_factor_premia_rolling/Rolling 60-month premia with NBER recessions
Figure 303_implied_vs_realized/Implied vs realized premia grouped bars
Figure 404_hj_bound/Hansen-Jagannathan bound visualization
Figure 505_factor_correlations/Factor correlation matrix heatmap
Figure 606_timing_backtest/Timing strategy cumulative returns

Generate All Charts

cd academic-primer-framework/figures

# Generate each chart
python 01_factor_premia_history/chart.py
python 02_factor_premia_rolling/chart.py
python 03_implied_vs_realized/chart.py
python 04_hj_bound/chart.py
python 05_factor_correlations/chart.py
python 06_timing_backtest/chart.py

# Or use justfile
just figures

Step 1: Download Data

Data Download

Input: URLs to Kenneth French Data Library

Output: Raw factor returns DataFrame (726 x 7)

Code

import pandas_datareader.data as web

# Download FF5 factors
ff5 = web.DataReader('F-F_Research_Data_5_Factors_2x3', 'famafrench',
                     start='1963-07')[0]
# Returns DataFrame with columns: Mkt-RF, SMB, HML, RMW, CMA, RF
# Values in percentage points

# Download Momentum factor
mom = web.DataReader('F-F_Momentum_Factor', 'famafrench',
                     start='1963-07')[0]
# Returns DataFrame with column: Mom

# Combine and convert to decimals
factors = ff5.join(mom)
factors = factors / 100  # Convert from % to decimals

Sample Output

            Mkt-RF     SMB     HML     RMW     CMA      RF     Mom
1963-07    -0.0039 -0.0085  0.0204  0.0023 -0.0081  0.0027  0.0045
1963-08     0.0507 -0.0224 -0.0202  0.0132 -0.0040  0.0025  0.0114
1963-09    -0.0167 -0.0021  0.0081  0.0034  0.0126  0.0027 -0.0163
...
2023-10    -0.0246 -0.0291  0.0281 -0.0076  0.0196  0.0044 -0.0426
2023-11     0.0913  0.0127 -0.0021  0.0066 -0.0174  0.0044  0.0622
2023-12     0.0495  0.0656  0.0295 -0.0169  0.0008  0.0043  0.0193

[726 rows x 7 columns]

Step 2: Compute Statistics

Statistics Computation

Input: Factor returns DataFrame (726 x 7)

Output: Statistics for each factor (mean, std, t-stat, Sharpe, etc.)

Code

factor_cols = ['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA', 'Mom']
T = len(factors)  # 726

for f in factor_cols:
    col = factors[f]
    mean_monthly = col.mean()
    std_monthly = col.std()

    # Annualize
    mean_annual = mean_monthly * 12 * 100  # in %
    std_annual = std_monthly * np.sqrt(12) * 100  # in %

    # t-statistic (tests H0: mean = 0)
    t_stat = mean_monthly / (std_monthly / np.sqrt(T))

    # Sharpe ratio (annualized)
    sharpe = mean_annual / std_annual

    # Higher moments
    skew = col.skew()
    kurt = col.kurtosis()  # excess kurtosis

    # Max drawdown
    cum_ret = (1 + col).cumprod()
    running_max = cum_ret.cummax()
    drawdown = (cum_ret - running_max) / running_max
    max_dd = drawdown.min() * 100

Output: Factor Statistics

FactorMean (%)Std (%)t-statSharpeMax DD
MKT-RF6.915.63.430.44-55.8%
SMB2.510.51.880.24-56.4%
HML3.610.32.690.35-57.8%
RMW3.47.73.390.44-41.8%
CMA3.27.23.480.45-25.0%
UMD7.114.63.780.49-57.8%

Step 3: Fama-MacBeth Regressions

Fama-MacBeth Two-Pass Procedure

Input: Factor returns (726 x K), Portfolio returns (726 x 25)

Output: Factor risk premia estimates with standard errors

Fama-MacBeth (1973) Methodology

Pass 1 (Time Series): For each portfolio \(i\), estimate factor betas:

\[R_{i,t} - r_{f,t} = \alpha_i + \sum_{k=1}^{K} \beta_{ik} f_{k,t} + \varepsilon_{i,t}\]

Pass 2 (Cross-Section): For each month \(t\), run:

\[\bar{R}_i - r_f = \gamma_0 + \sum_{k=1}^{K} \gamma_k \hat{\beta}_{ik} + \eta_i\]

Risk Premium: Average of monthly cross-sectional estimates:

\[\hat{\lambda}_k = \frac{1}{T}\sum_{t=1}^{T} \hat{\gamma}_{k,t}\]

Code

def compute_fama_macbeth(factors, portfolios):
    """Run Fama-MacBeth regressions."""
    factor_cols = ['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA']
    T = len(factors)

    # Step 1: Estimate betas for each portfolio (full sample)
    betas = {}
    for col in portfolios.columns:
        X = factors[factor_cols].values
        X = np.column_stack([np.ones(T), X])  # Add intercept
        y = portfolios[col].values
        beta = np.linalg.lstsq(X, y, rcond=None)[0]
        betas[col] = beta[1:]  # Exclude intercept

    # Step 2: Cross-sectional regression
    avg_ret = portfolios.mean() * 12 * 100  # Annualized %
    B = np.array([betas[c] for c in portfolios.columns])
    gamma = np.linalg.lstsq(B, avg_ret.values, rcond=None)[0]

    # Compute R-squared
    resid = avg_ret.values - B @ gamma
    r2 = 1 - np.var(resid) / np.var(avg_ret.values)

    return gamma, r2

Output: Risk Premia Estimates

FactorThree-Factor ModelFive-Factor Model
\(\hat{\lambda}\)SEt\(\hat{\lambda}\)SEt
MKT10.600.3629.0610.340.2738.05
SMB2.710.367.443.760.2713.84
HML4.400.3612.053.300.2712.14
RMW------5.880.2721.63
CMA------2.160.277.96
\(R^2_{CS}\)0.330.63

Step 4: Implied Premia via Reverse Optimization

Reverse Optimization

Input: Factor covariance matrix \(\bm{\Omega}_f\), factor exposures \(\bm{w}^f\), risk aversion \(\gamma\)

Output: Implied factor premia \(\bm{\lambda}^{impl}\)

Key Formula: Implied Factor Premium \[\bm{\lambda}^{impl} = \gamma \cdot \bm{\Omega}_f \cdot \bm{w}^f\]

where \(\gamma\) is the risk aversion coefficient (typically 2-4).

Code

def compute_implied_premia(factors):
    """Compute implied factor premia using reverse optimization."""
    factor_cols = ['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA']

    # Factor covariance (annualized)
    cov = factors[factor_cols].cov() * 12

    # Market portfolio factor exposures (approximate)
    # Based on typical institutional allocation
    w_factor = np.array([1.0, 0.15, 0.10, 0.15, 0.12])

    implied = {}
    for gamma in [2, 3, 4]:
        lambda_impl = gamma * cov.values @ w_factor * 100  # Percentage
        implied[f'gamma_{gamma}'] = {
            'MKT': lambda_impl[0] / 12,  # Monthly
            'SMB': lambda_impl[1] / 12,
            'HML': lambda_impl[2] / 12,
            'RMW': lambda_impl[3] / 12,
            'CMA': lambda_impl[4] / 12
        }

    return implied

Output: Implied vs Realized Premia (Monthly %)

FactorRealized\(\gamma=2\)\(\gamma=3\)\(\gamma=4\)
MKT0.570.400.590.79
SMB0.210.090.140.19
HML0.30-0.03-0.04-0.05
RMW0.28-0.03-0.04-0.06
CMA0.27-0.05-0.07-0.10

Note: Implied premia are lower than realized for most factors, consistent with post-publication decay.

Step 5: GRS Model Comparison Tests

GRS Test

Input: Factor returns, portfolio returns, model specification

Output: GRS F-statistic, p-value, HJ distance

Gibbons-Ross-Shanken (1989) Test

Tests whether all pricing errors (alphas) are jointly zero:

\[H_0: \bm{\alpha} = \bm{0}\]

Test statistic:

\[\text{GRS} = \frac{T - N - K}{N} \cdot \frac{\hat{\bm{\alpha}}'\hat{\bm{\Sigma}}_\varepsilon^{-1}\hat{\bm{\alpha}}}{1 + \hat{\bm{\mu}}_f'\hat{\bm{\Sigma}}_f^{-1}\hat{\bm{\mu}}_f} \sim F_{N, T-N-K}\]

Output: Model Comparison

ModelFactorsGRS Fp-valueHJ Distance
CAPM123.63<0.0010.03
FF3327.10<0.0010.04
Carhart426.39<0.0010.04
FF5524.97<0.0010.04
FF5+Mom624.68<0.0010.04

Interpretation: All models are rejected at the 1% level, but model fit improves with additional factors.

Step 6: Factor Timing Backtest

Timing Strategy Backtest

Input: Factor returns (1980-2023), timing signals

Output: Strategy returns, Sharpe ratios, turnover

Strategy Definition

Timing signal for factor \(k\) at time \(t\):

\[z_{k,t} = \frac{\bar{\lambda}_{k,t}^{(12)} - \bar{\lambda}_{k,t}^{(expand)}}{\hat{\sigma}_{k,t}^{(60)}}\]

Dynamic weight:

\[w_{k,t+1} = \bar{w}_k \cdot (1 + \kappa \cdot z_{k,t})\]

Output: Backtest Results (1980-2023)

StrategyAnn. ReturnAnn. VolSharpeMax DDTurnover
Static FF34.1%7.1%0.57-29.4%0%
Timing (implied)4.9%7.2%0.68-26.2%24%
Timing (realized)4.9%7.2%0.68-26.5%24%

Key Finding: Timing strategies achieve higher Sharpe ratios (0.68 vs 0.57) with lower maximum drawdowns.

Output Tables

TableLabelLocationContent
Table 4tab:historical_premia02_theory.texHistorical factor premia (annual)
Table 8tab:factor_stats05_empirical.texMonthly statistics with moments
Table 9tab:fm_results05_empirical.texFama-MacBeth estimates
Table 11tab:implied_vs_realized05_empirical.texImplied vs realized premia
Table 12tab:subsamples05_empirical.texSubsample analysis
Table 14tab:model_tests05_empirical.texGRS test statistics
Table 15tab:timing_results06_validation.texTiming backtest results
Table C.19tab:ff_stats_detailC_data_catalog.texDetailed factor statistics
Table C.20tab:factor_corrC_data_catalog.texFactor correlation matrix

Output Figures

FigureLabelDescription
Figure 1fig:factor_premia_historyHistorical factor premia bar chart with 95% confidence intervals
Figure 2fig:rolling_premiaRolling 60-month premia with NBER recession shading
Figure 3fig:implied_vs_realizedImplied vs realized premia comparison (grouped bars)
Figure 4fig:hj_boundHansen-Jagannathan bound with max Sharpe ratios by model
Figure 5fig:factor_correlationsFactor correlation matrix heatmap
Figure 6fig:timing_backtestCumulative returns from factor timing strategies

Max Sharpe Ratios (HJ Bound Visualization)

ModelMax Sharpe Ratio
CAPM (1 factor)0.44
FF3 (3 factors)0.63
FF5 (5 factors)1.03
FF5+Mom (6 factors)1.19

Output JSON

File: figures/_shared/computed_data.json

JSON Structure
{
  "data_source": {
    "url": "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html",
    "datasets": ["F-F_Research_Data_5_Factors_2x3", "F-F_Momentum_Factor", "25_Portfolios_5x5"],
    "period": "July 1963 - December 2023",
    "access_date": "January 2026"
  },
  "factor_names": ["MKT-RF", "SMB", "HML", "RMW", "CMA", "UMD"],
  "table_historical_premia": {
    "Mkt-RF": {"mean": 6.86, "std": 15.57, "sharpe": 0.44, "t_stat": 3.43},
    "SMB": {"mean": 2.53, "std": 10.46, "sharpe": 0.24, "t_stat": 1.88},
    ...
  },
  "correlation_matrix": {...},
  "fama_macbeth": {...},
  "implied_vs_realized": {...},
  "subsamples": {...},
  "model_tests": {...},
  "timing_backtest": {...}
}

Key Formulas

1. Factor Risk Premium (Definition) \[\lambda_k = \E[f_k - r_f]\]
2. CAPM Pricing \[\E[R_i] - r_f = \beta_i (\E[R_M] - r_f)\]
3. APT Pricing \[\E[R_i] - r_f = \sum_{k=1}^{K} \beta_{ik} \lambda_k\]
4. Implied Factor Premium (Reverse Optimization) \[\bm{\lambda}^{impl} = \gamma \cdot \bm{\Omega}_f \cdot \bm{w}^f\]
5. Hansen-Jagannathan Bound \[\frac{\sigma(m)}{\E[m]} \geq \sqrt{\bm{\mu}'\bm{\Sigma}^{-1}\bm{\mu}}\]
6. GRS Test Statistic \[\text{GRS} = \frac{T - N - K}{N} \cdot \frac{\hat{\bm{\alpha}}'\hat{\bm{\Sigma}}_\varepsilon^{-1}\hat{\bm{\alpha}}}{1 + \hat{\bm{\mu}}_f'\hat{\bm{\Sigma}}_f^{-1}\hat{\bm{\mu}}_f}\]
7. Fama-MacBeth t-statistic with Shanken Correction \[\text{SE}_{Shanken} = \text{SE}_{FM} \times \sqrt{1 + \hat{\bm{\mu}}_f'\hat{\bm{\Sigma}}_f^{-1}\hat{\bm{\mu}}_f}\]

File Structure

academic-primer-framework/
|-- figures/
|   |-- _shared/
|   |   |-- data_loader.py       # Data download and caching
|   |   |-- computed_data.json   # Cached statistics
|   |   |-- colors.py            # Color palette
|   |   |-- styles.py            # Matplotlib styles
|   |   |-- cache/               # Parquet cache files
|   |
|   |-- generate_all_data.py     # Master statistics script
|   |
|   |-- 01_factor_premia_history/
|   |   |-- chart.py             # Historical premia bar chart
|   |   |-- chart.pdf            # Generated figure
|   |
|   |-- 02_factor_premia_rolling/
|   |   |-- chart.py             # Rolling premia time series
|   |   |-- chart.pdf
|   |
|   |-- 03_implied_vs_realized/
|   |   |-- chart.py             # Comparison grouped bars
|   |   |-- chart.pdf
|   |
|   |-- 04_hj_bound/
|   |   |-- chart.py             # HJ bound visualization
|   |   |-- chart.pdf
|   |
|   |-- 05_factor_correlations/
|   |   |-- chart.py             # Correlation heatmap
|   |   |-- chart.pdf
|   |
|   |-- 06_timing_backtest/
|       |-- chart.py             # Timing strategy backtest
|       |-- chart.pdf
|
|-- paper/
|   |-- main.tex                 # Master document
|   |-- sections/
|   |   |-- 02_theory.tex        # Table 4
|   |   |-- 05_empirical.tex     # Tables 8, 9, 11, 12, 14
|   |   |-- 06_validation.tex    # Table 15
|   |-- appendices/
|       |-- C_data_catalog.tex   # Tables C.19, C.20
|
|-- docs/
    |-- fama_french_pipeline.html  # This document

Reproduction Guide

Requirements

pip install pandas numpy scipy matplotlib pandas-datareader

Step-by-Step Reproduction

Step 1: Download and Compute Statistics
cd academic-primer-framework/figures
python generate_all_data.py

This downloads Fama-French data, computes all statistics, and saves to _shared/computed_data.json.

Step 2: Generate All Figures
python 01_factor_premia_history/chart.py
python 02_factor_premia_rolling/chart.py
python 03_implied_vs_realized/chart.py
python 04_hj_bound/chart.py
python 05_factor_correlations/chart.py
python 06_timing_backtest/chart.py

Each script generates a chart.pdf in its folder.

Step 3: Compile LaTeX Document
cd ../paper
pdflatex main.tex
biber main
pdflatex main.tex
pdflatex main.tex

Produces main.pdf (69 pages).

Using justfile

cd academic-primer-framework
just figures    # Generate all charts
just build      # Compile LaTeX
just all        # Full build