Fama-French Data Pipeline

Academic Primer: "Implied Risk Premia for Factors: Theory, Estimation, and Applications"

Author: Joerg Osterrieder

Last Updated: January 2026

Pipeline Purpose

This pipeline downloads real Fama-French factor data from Kenneth French's Data Library, computes all statistics needed for the academic primer's tables and figures, and generates publication-quality visualizations.

Data Flow Diagram

+---------------------------+
|   Kenneth French Data     |
|        Library            |
|   (mba.tuck.dartmouth)    |
+------------+--------------+
             |
             v
+---------------------------+
|   pandas_datareader       |
|   load_ff_factors()       |
|   load_25_portfolios()    |
+------------+--------------+
             |
             v
+---------------------------+
|   Raw Factor Returns      |
|   - Mkt-RF, SMB, HML      |
|   - RMW, CMA, Mom, RF     |
|   - 25 Size/BM Portfolios |
|   T = 726 months          |
+------------+--------------+
             |
      +------+------+
      |             |
      v             v
+-------------+  +---------------+
| Statistics  |  | Chart Scripts |
| Generator   |  | (6 charts)    |
| (JSON)      |  | chart.py      |
+------+------+  +-------+-------+
       |                 |
       v                 v
+-------------+  +---------------+
| LaTeX       |  | PDF Figures   |
| Tables      |  | (chart.pdf)   |
| (9 tables)  |  | (6 figures)   |
+-------------+  +---------------+

Data Source

Kenneth French Data Library

URL https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html

Datasets F-F_Research_Data_5_Factors_2x3, F-F_Momentum_Factor, 25_Portfolios_5x5

Period July 1963 - December 2023 (T = 726 months)

Access Date: January 2026

Why This Data Source?

Academic Standard: The Fama-French factors are the benchmark in empirical asset pricing research
Free Access: Data freely available for academic and commercial use
Long History: 60+ years of monthly data enables robust statistical inference
Consistency: Methodology documented and updated by Ken French

Data License

The data is provided by Kenneth R. French for academic research. When using this data, cite:

Fama, E.F. and French, K.R. (1993). "Common risk factors in the returns on stocks and bonds." Journal of Financial Economics, 33(1), 3-56.

Input Datasets

Dataset 1: F-F_Research_Data_5_Factors_2x3

Variable	Description	Construction
`Mkt-RF`	Market excess return	Value-weighted return on all NYSE/AMEX/NASDAQ stocks minus 1-month T-bill rate
`SMB`	Small Minus Big	Return on small-cap minus large-cap portfolios (size breakpoint: NYSE median)
`HML`	High Minus Low	Return on high B/M minus low B/M portfolios (B/M breakpoints: NYSE 30/70)
`RMW`	Robust Minus Weak	Return on high OP minus low OP portfolios (operating profitability)
`CMA`	Conservative Minus Aggressive	Return on low investment minus high investment portfolios
`RF`	Risk-free rate	1-month U.S. Treasury bill rate

Dataset 2: F-F_Momentum_Factor

Variable	Description	Construction
`Mom`	Up Minus Down (UMD)	Return on high prior return (winners) minus low prior return (losers) portfolios. Prior returns: months t-12 to t-2.

Dataset 3: 25_Portfolios_5x5

25 portfolios formed on the intersection of 5 size quintiles and 5 book-to-market quintiles, using NYSE breakpoints. These portfolios serve as test assets for cross-sectional regressions.

Variable Definitions

Factor Return Notation

Let \(f_{k,t}\) denote the return on factor \(k\) in month \(t\). For \(T\) months of data:

\[\bar{f}_k = \frac{1}{T}\sum_{t=1}^{T} f_{k,t} \quad \text{(sample mean)}\] \[\hat{\sigma}_k = \sqrt{\frac{1}{T-1}\sum_{t=1}^{T}(f_{k,t} - \bar{f}_k)^2} \quad \text{(sample std dev)}\] \[t_k = \frac{\bar{f}_k}{\hat{\sigma}_k / \sqrt{T}} \quad \text{(t-statistic)}\] \[\text{SR}_k = \frac{\bar{f}_k \times 12}{\hat{\sigma}_k \times \sqrt{12}} \quad \text{(annualized Sharpe ratio)}\]

Annualization

Mean: Multiply monthly mean by 12
Volatility: Multiply monthly std dev by \(\sqrt{12}\)
Sharpe Ratio: Annualized mean divided by annualized volatility

1. Data Loader Module

File: figures/_shared/data_loader.py

Purpose

Central module for downloading, caching, and processing Fama-French factor data. Provides fallback data generation if pandas_datareader is unavailable.

Core Functions

load_ff_factors(start, end, use_cache)

def load_ff_factors(start='1963-07', end='2023-12', use_cache=True):
    """
    Load Fama-French 5 factors + momentum from Kenneth French library.

    Parameters
    ----------
    start : str
        Start date in YYYY-MM format (default: '1963-07')
    end : str
        End date in YYYY-MM format (default: '2023-12')
    use_cache : bool
        Whether to use cached data if available

    Returns
    -------
    pd.DataFrame
        DataFrame with columns: Mkt-RF, SMB, HML, RMW, CMA, Mom, RF
        Index: DatetimeIndex (monthly)
        Values: decimal returns (not percentages)
    """
    cache_file = CACHE_DIR / "ff_factors.parquet"

    if use_cache and cache_file.exists():
        factors = pd.read_parquet(cache_file)
        factors = factors.loc[start:end]
        return factors

    try:
        import pandas_datareader.data as web

        # Download FF5 factors
        ff5 = web.DataReader('F-F_Research_Data_5_Factors_2x3',
                             'famafrench', start=start)[0]

        # Download Momentum factor
        mom = web.DataReader('F-F_Momentum_Factor',
                             'famafrench', start=start)[0]

        # Combine and convert to decimals
        factors = ff5.join(mom)
        factors = factors / 100  # Convert percentages to decimals

        # Standardize column names
        factors.columns = ['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA', 'RF', 'Mom']

        # Cache the data
        CACHE_DIR.mkdir(exist_ok=True)
        factors.to_parquet(cache_file)

        return factors.loc[start:end]

    except ImportError:
        print("pandas_datareader not installed. Using fallback data.")
        return _load_fallback_factors(start, end)

load_25_portfolios(start, end, use_cache)

def load_25_portfolios(start='1963-07', end='2023-12', use_cache=True):
    """
    Load 25 Size/Book-to-Market portfolios for cross-sectional tests.

    Returns
    -------
    pd.DataFrame
        DataFrame with 25 portfolio returns
        Index: DatetimeIndex (monthly)
        Values: decimal returns
    """
    cache_file = CACHE_DIR / "portfolios_25.parquet"

    if use_cache and cache_file.exists():
        portfolios = pd.read_parquet(cache_file)
        return portfolios.loc[start:end]

    try:
        import pandas_datareader.data as web

        portfolios = web.DataReader('25_Portfolios_5x5',
                                    'famafrench', start=start)[0]
        portfolios = portfolios / 100  # Convert to decimals

        # Cache
        CACHE_DIR.mkdir(exist_ok=True)
        portfolios.to_parquet(cache_file)

        return portfolios.loc[start:end]

    except ImportError:
        return _load_fallback_portfolios(start, end)

compute_factor_statistics(factors, annualize)

def compute_factor_statistics(factors, annualize=True):
    """
    Compute comprehensive factor statistics.

    Returns
    -------
    pd.DataFrame
        Statistics: Mean, Std, t-stat, Sharpe, Skew, Kurt, Min, Max
    """
    factor_cols = ['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA', 'Mom']
    factors = factors[factor_cols]
    T = len(factors)

    stats = pd.DataFrame(index=factor_cols)
    stats['Mean'] = factors.mean()
    stats['Std'] = factors.std()
    stats['t-stat'] = stats['Mean'] / (stats['Std'] / np.sqrt(T))
    stats['Sharpe'] = stats['Mean'] / stats['Std']
    stats['Skew'] = factors.skew()
    stats['Kurt'] = factors.kurtosis()
    stats['Min'] = factors.min()
    stats['Max'] = factors.max()

    if annualize:
        stats['Mean'] = stats['Mean'] * 12
        stats['Std'] = stats['Std'] * np.sqrt(12)
        stats['Sharpe'] = stats['Mean'] / stats['Std']

    return stats

2. Statistics Generator

File: figures/generate_all_data.py

Purpose

Master script that computes ALL statistics needed for paper tables. Outputs JSON for charts and prints LaTeX-formatted tables.

Computation Steps

Load Fama-French factors (726 months)
Load 25 Size/BM portfolios (25 test assets)
Compute historical factor statistics (Table 4, C.19)
Compute correlation matrix (Table C.20)
Run Fama-MacBeth regressions (Table 9)
Compute implied premia via reverse optimization (Table 11)
Compute subsample premia (Table 12)
Run GRS model comparison tests (Table 14)
Run factor timing backtest (Table 15)
Save all results to JSON

Run Command

cd academic-primer-framework/figures
python generate_all_data.py

Output

======================================================================
COMPUTING ALL STATISTICS FROM FAMA-FRENCH DATA
======================================================================

Data Source: https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
Period: July 1963 - December 2023

Loading Fama-French factors...
  Loaded 726 months (1963-07 to 2023-12)
Loading 25 Size/BM portfolios...
  Loaded 726 months, 25 portfolios

----------------------------------------------------------------------
TABLE 4 / TABLE C.19: Historical Factor Premia
----------------------------------------------------------------------
[LaTeX table output...]

----------------------------------------------------------------------
TABLE C.20: Factor Correlation Matrix
----------------------------------------------------------------------
[LaTeX table output...]

...

======================================================================
Saving results to figures/_shared/computed_data.json
======================================================================
Results saved to: figures\_shared\computed_data.json

3. Chart Generation

Each figure has its own folder with a standalone chart.py script.

Figure	Folder	Description
Figure 1	`01_factor_premia_history/`	Historical factor premia bar chart with 95% CI
Figure 2	`02_factor_premia_rolling/`	Rolling 60-month premia with NBER recessions
Figure 3	`03_implied_vs_realized/`	Implied vs realized premia grouped bars
Figure 4	`04_hj_bound/`	Hansen-Jagannathan bound visualization
Figure 5	`05_factor_correlations/`	Factor correlation matrix heatmap
Figure 6	`06_timing_backtest/`	Timing strategy cumulative returns

Generate All Charts

cd academic-primer-framework/figures

# Generate each chart
python 01_factor_premia_history/chart.py
python 02_factor_premia_rolling/chart.py
python 03_implied_vs_realized/chart.py
python 04_hj_bound/chart.py
python 05_factor_correlations/chart.py
python 06_timing_backtest/chart.py

# Or use justfile
just figures

Step 1: Download Data

Data Download

Input: URLs to Kenneth French Data Library

Output: Raw factor returns DataFrame (726 x 7)

Code

import pandas_datareader.data as web

# Download FF5 factors
ff5 = web.DataReader('F-F_Research_Data_5_Factors_2x3', 'famafrench',
                     start='1963-07')[0]
# Returns DataFrame with columns: Mkt-RF, SMB, HML, RMW, CMA, RF
# Values in percentage points

# Download Momentum factor
mom = web.DataReader('F-F_Momentum_Factor', 'famafrench',
                     start='1963-07')[0]
# Returns DataFrame with column: Mom

# Combine and convert to decimals
factors = ff5.join(mom)
factors = factors / 100  # Convert from % to decimals

Sample Output

            Mkt-RF     SMB     HML     RMW     CMA      RF     Mom
1963-07    -0.0039 -0.0085  0.0204  0.0023 -0.0081  0.0027  0.0045
1963-08     0.0507 -0.0224 -0.0202  0.0132 -0.0040  0.0025  0.0114
1963-09    -0.0167 -0.0021  0.0081  0.0034  0.0126  0.0027 -0.0163
...
2023-10    -0.0246 -0.0291  0.0281 -0.0076  0.0196  0.0044 -0.0426
2023-11     0.0913  0.0127 -0.0021  0.0066 -0.0174  0.0044  0.0622
2023-12     0.0495  0.0656  0.0295 -0.0169  0.0008  0.0043  0.0193

[726 rows x 7 columns]

Step 2: Compute Statistics

Statistics Computation

Input: Factor returns DataFrame (726 x 7)

Output: Statistics for each factor (mean, std, t-stat, Sharpe, etc.)

Code

factor_cols = ['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA', 'Mom']
T = len(factors)  # 726

for f in factor_cols:
    col = factors[f]
    mean_monthly = col.mean()
    std_monthly = col.std()

    # Annualize
    mean_annual = mean_monthly * 12 * 100  # in %
    std_annual = std_monthly * np.sqrt(12) * 100  # in %

    # t-statistic (tests H0: mean = 0)
    t_stat = mean_monthly / (std_monthly / np.sqrt(T))

    # Sharpe ratio (annualized)
    sharpe = mean_annual / std_annual

    # Higher moments
    skew = col.skew()
    kurt = col.kurtosis()  # excess kurtosis

    # Max drawdown
    cum_ret = (1 + col).cumprod()
    running_max = cum_ret.cummax()
    drawdown = (cum_ret - running_max) / running_max
    max_dd = drawdown.min() * 100

Output: Factor Statistics

Factor	Mean (%)	Std (%)	t-stat	Sharpe	Max DD
MKT-RF	6.9	15.6	3.43	0.44	-55.8%
SMB	2.5	10.5	1.88	0.24	-56.4%
HML	3.6	10.3	2.69	0.35	-57.8%
RMW	3.4	7.7	3.39	0.44	-41.8%
CMA	3.2	7.2	3.48	0.45	-25.0%
UMD	7.1	14.6	3.78	0.49	-57.8%

Step 3: Fama-MacBeth Regressions

Fama-MacBeth Two-Pass Procedure

Input: Factor returns (726 x K), Portfolio returns (726 x 25)

Output: Factor risk premia estimates with standard errors

Fama-MacBeth (1973) Methodology

Pass 1 (Time Series): For each portfolio \(i\), estimate factor betas:

\[R_{i,t} - r_{f,t} = \alpha_i + \sum_{k=1}^{K} \beta_{ik} f_{k,t} + \varepsilon_{i,t}\]

Pass 2 (Cross-Section): For each month \(t\), run:

\[\bar{R}_i - r_f = \gamma_0 + \sum_{k=1}^{K} \gamma_k \hat{\beta}_{ik} + \eta_i\]

Risk Premium: Average of monthly cross-sectional estimates:

\[\hat{\lambda}_k = \frac{1}{T}\sum_{t=1}^{T} \hat{\gamma}_{k,t}\]

Code

def compute_fama_macbeth(factors, portfolios):
    """Run Fama-MacBeth regressions."""
    factor_cols = ['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA']
    T = len(factors)

    # Step 1: Estimate betas for each portfolio (full sample)
    betas = {}
    for col in portfolios.columns:
        X = factors[factor_cols].values
        X = np.column_stack([np.ones(T), X])  # Add intercept
        y = portfolios[col].values
        beta = np.linalg.lstsq(X, y, rcond=None)[0]
        betas[col] = beta[1:]  # Exclude intercept

    # Step 2: Cross-sectional regression
    avg_ret = portfolios.mean() * 12 * 100  # Annualized %
    B = np.array([betas[c] for c in portfolios.columns])
    gamma = np.linalg.lstsq(B, avg_ret.values, rcond=None)[0]

    # Compute R-squared
    resid = avg_ret.values - B @ gamma
    r2 = 1 - np.var(resid) / np.var(avg_ret.values)

    return gamma, r2

Output: Risk Premia Estimates

Factor	Three-Factor Model			Five-Factor Model
	\(\hat{\lambda}\)	SE	t	\(\hat{\lambda}\)	SE	t
MKT	10.60	0.36	29.06	10.34	0.27	38.05
SMB	2.71	0.36	7.44	3.76	0.27	13.84
HML	4.40	0.36	12.05	3.30	0.27	12.14
RMW	--	--	--	5.88	0.27	21.63
CMA	--	--	--	2.16	0.27	7.96
\(R^2_{CS}\)	0.33			0.63

Step 4: Implied Premia via Reverse Optimization

Reverse Optimization

Input: Factor covariance matrix \(\bm{\Omega}_f\), factor exposures \(\bm{w}^f\), risk aversion \(\gamma\)

Output: Implied factor premia \(\bm{\lambda}^{impl}\)

Key Formula: Implied Factor Premium \[\bm{\lambda}^{impl} = \gamma \cdot \bm{\Omega}_f \cdot \bm{w}^f\]

where \(\gamma\) is the risk aversion coefficient (typically 2-4).

Code

def compute_implied_premia(factors):
    """Compute implied factor premia using reverse optimization."""
    factor_cols = ['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA']

    # Factor covariance (annualized)
    cov = factors[factor_cols].cov() * 12

    # Market portfolio factor exposures (approximate)
    # Based on typical institutional allocation
    w_factor = np.array([1.0, 0.15, 0.10, 0.15, 0.12])

    implied = {}
    for gamma in [2, 3, 4]:
        lambda_impl = gamma * cov.values @ w_factor * 100  # Percentage
        implied[f'gamma_{gamma}'] = {
            'MKT': lambda_impl[0] / 12,  # Monthly
            'SMB': lambda_impl[1] / 12,
            'HML': lambda_impl[2] / 12,
            'RMW': lambda_impl[3] / 12,
            'CMA': lambda_impl[4] / 12
        }

    return implied

Output: Implied vs Realized Premia (Monthly %)

Factor	Realized	\(\gamma=2\)	\(\gamma=3\)	\(\gamma=4\)
MKT	0.57	0.40	0.59	0.79
SMB	0.21	0.09	0.14	0.19
HML	0.30	-0.03	-0.04	-0.05
RMW	0.28	-0.03	-0.04	-0.06
CMA	0.27	-0.05	-0.07	-0.10

Note: Implied premia are lower than realized for most factors, consistent with post-publication decay.

Step 5: GRS Model Comparison Tests

GRS Test

Input: Factor returns, portfolio returns, model specification

Output: GRS F-statistic, p-value, HJ distance

Gibbons-Ross-Shanken (1989) Test

Tests whether all pricing errors (alphas) are jointly zero:

\[H_0: \bm{\alpha} = \bm{0}\]

Test statistic:

\[\text{GRS} = \frac{T - N - K}{N} \cdot \frac{\hat{\bm{\alpha}}'\hat{\bm{\Sigma}}_\varepsilon^{-1}\hat{\bm{\alpha}}}{1 + \hat{\bm{\mu}}_f'\hat{\bm{\Sigma}}_f^{-1}\hat{\bm{\mu}}_f} \sim F_{N, T-N-K}\]

Output: Model Comparison

Model	Factors	GRS F	p-value	HJ Distance
CAPM	1	23.63	<0.001	0.03
FF3	3	27.10	<0.001	0.04
Carhart	4	26.39	<0.001	0.04
FF5	5	24.97	<0.001	0.04
FF5+Mom	6	24.68	<0.001	0.04

Interpretation: All models are rejected at the 1% level, but model fit improves with additional factors.

Step 6: Factor Timing Backtest

Timing Strategy Backtest

Input: Factor returns (1980-2023), timing signals

Output: Strategy returns, Sharpe ratios, turnover

Strategy Definition

Timing signal for factor \(k\) at time \(t\):

\[z_{k,t} = \frac{\bar{\lambda}_{k,t}^{(12)} - \bar{\lambda}_{k,t}^{(expand)}}{\hat{\sigma}_{k,t}^{(60)}}\]

Dynamic weight:

\[w_{k,t+1} = \bar{w}_k \cdot (1 + \kappa \cdot z_{k,t})\]

Output: Backtest Results (1980-2023)

Strategy	Ann. Return	Ann. Vol	Sharpe	Max DD	Turnover
Static FF3	4.1%	7.1%	0.57	-29.4%	0%
Timing (implied)	4.9%	7.2%	0.68	-26.2%	24%
Timing (realized)	4.9%	7.2%	0.68	-26.5%	24%

Key Finding: Timing strategies achieve higher Sharpe ratios (0.68 vs 0.57) with lower maximum drawdowns.

Output Tables

Table	Label	Location	Content
Table 4	tab:historical_premia	02_theory.tex	Historical factor premia (annual)
Table 8	tab:factor_stats	05_empirical.tex	Monthly statistics with moments
Table 9	tab:fm_results	05_empirical.tex	Fama-MacBeth estimates
Table 11	tab:implied_vs_realized	05_empirical.tex	Implied vs realized premia
Table 12	tab:subsamples	05_empirical.tex	Subsample analysis
Table 14	tab:model_tests	05_empirical.tex	GRS test statistics
Table 15	tab:timing_results	06_validation.tex	Timing backtest results
Table C.19	tab:ff_stats_detail	C_data_catalog.tex	Detailed factor statistics
Table C.20	tab:factor_corr	C_data_catalog.tex	Factor correlation matrix

Output Figures

Figure	Label	Description
Figure 1	fig:factor_premia_history	Historical factor premia bar chart with 95% confidence intervals
Figure 2	fig:rolling_premia	Rolling 60-month premia with NBER recession shading
Figure 3	fig:implied_vs_realized	Implied vs realized premia comparison (grouped bars)
Figure 4	fig:hj_bound	Hansen-Jagannathan bound with max Sharpe ratios by model
Figure 5	fig:factor_correlations	Factor correlation matrix heatmap
Figure 6	fig:timing_backtest	Cumulative returns from factor timing strategies

Max Sharpe Ratios (HJ Bound Visualization)

Model	Max Sharpe Ratio
CAPM (1 factor)	0.44
FF3 (3 factors)	0.63
FF5 (5 factors)	1.03
FF5+Mom (6 factors)	1.19

Output JSON

File: figures/_shared/computed_data.json

JSON Structure

{
  "data_source": {
    "url": "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html",
    "datasets": ["F-F_Research_Data_5_Factors_2x3", "F-F_Momentum_Factor", "25_Portfolios_5x5"],
    "period": "July 1963 - December 2023",
    "access_date": "January 2026"
  },
  "factor_names": ["MKT-RF", "SMB", "HML", "RMW", "CMA", "UMD"],
  "table_historical_premia": {
    "Mkt-RF": {"mean": 6.86, "std": 15.57, "sharpe": 0.44, "t_stat": 3.43},
    "SMB": {"mean": 2.53, "std": 10.46, "sharpe": 0.24, "t_stat": 1.88},
    ...
  },
  "correlation_matrix": {...},
  "fama_macbeth": {...},
  "implied_vs_realized": {...},
  "subsamples": {...},
  "model_tests": {...},
  "timing_backtest": {...}
}

Key Formulas

1. Factor Risk Premium (Definition) \[\lambda_k = \E[f_k - r_f]\]

2. CAPM Pricing \[\E[R_i] - r_f = \beta_i (\E[R_M] - r_f)\]

3. APT Pricing \[\E[R_i] - r_f = \sum_{k=1}^{K} \beta_{ik} \lambda_k\]

4. Implied Factor Premium (Reverse Optimization) \[\bm{\lambda}^{impl} = \gamma \cdot \bm{\Omega}_f \cdot \bm{w}^f\]

5. Hansen-Jagannathan Bound \[\frac{\sigma(m)}{\E[m]} \geq \sqrt{\bm{\mu}'\bm{\Sigma}^{-1}\bm{\mu}}\]

6. GRS Test Statistic \[\text{GRS} = \frac{T - N - K}{N} \cdot \frac{\hat{\bm{\alpha}}'\hat{\bm{\Sigma}}_\varepsilon^{-1}\hat{\bm{\alpha}}}{1 + \hat{\bm{\mu}}_f'\hat{\bm{\Sigma}}_f^{-1}\hat{\bm{\mu}}_f}\]

7. Fama-MacBeth t-statistic with Shanken Correction \[\text{SE}_{Shanken} = \text{SE}_{FM} \times \sqrt{1 + \hat{\bm{\mu}}_f'\hat{\bm{\Sigma}}_f^{-1}\hat{\bm{\mu}}_f}\]

File Structure

academic-primer-framework/
|-- figures/
|   |-- _shared/
|   |   |-- data_loader.py       # Data download and caching
|   |   |-- computed_data.json   # Cached statistics
|   |   |-- colors.py            # Color palette
|   |   |-- styles.py            # Matplotlib styles
|   |   |-- cache/               # Parquet cache files
|   |
|   |-- generate_all_data.py     # Master statistics script
|   |
|   |-- 01_factor_premia_history/
|   |   |-- chart.py             # Historical premia bar chart
|   |   |-- chart.pdf            # Generated figure
|   |
|   |-- 02_factor_premia_rolling/
|   |   |-- chart.py             # Rolling premia time series
|   |   |-- chart.pdf
|   |
|   |-- 03_implied_vs_realized/
|   |   |-- chart.py             # Comparison grouped bars
|   |   |-- chart.pdf
|   |
|   |-- 04_hj_bound/
|   |   |-- chart.py             # HJ bound visualization
|   |   |-- chart.pdf
|   |
|   |-- 05_factor_correlations/
|   |   |-- chart.py             # Correlation heatmap
|   |   |-- chart.pdf
|   |
|   |-- 06_timing_backtest/
|       |-- chart.py             # Timing strategy backtest
|       |-- chart.pdf
|
|-- paper/
|   |-- main.tex                 # Master document
|   |-- sections/
|   |   |-- 02_theory.tex        # Table 4
|   |   |-- 05_empirical.tex     # Tables 8, 9, 11, 12, 14
|   |   |-- 06_validation.tex    # Table 15
|   |-- appendices/
|       |-- C_data_catalog.tex   # Tables C.19, C.20
|
|-- docs/
    |-- fama_french_pipeline.html  # This document

Reproduction Guide

Requirements

pip install pandas numpy scipy matplotlib pandas-datareader

Step-by-Step Reproduction

Step 1: Download and Compute Statistics

cd academic-primer-framework/figures
python generate_all_data.py

This downloads Fama-French data, computes all statistics, and saves to _shared/computed_data.json.

Step 2: Generate All Figures

python 01_factor_premia_history/chart.py
python 02_factor_premia_rolling/chart.py
python 03_implied_vs_realized/chart.py
python 04_hj_bound/chart.py
python 05_factor_correlations/chart.py
python 06_timing_backtest/chart.py

Each script generates a chart.pdf in its folder.

Step 3: Compile LaTeX Document

cd ../paper
pdflatex main.tex
biber main
pdflatex main.tex
pdflatex main.tex

Produces main.pdf (69 pages).

Using justfile

cd academic-primer-framework
just figures    # Generate all charts
just build      # Compile LaTeX
just all        # Full build