Complete Replication Guide | Version 1.0 | 2025-12-28
This guide documents the complete methodology for constructing a corpus of 65 peer-reviewed papers on mutual fund style drift. The corpus was built through a two-stage process: (1) parallel independent searches using different strategies, and (2) unified keyword-based relevance classification.
Rather than relying solely on journal prestige or citation counts, we applied domain-specific keyword filtering that mirrors how experts identify relevant literature. Papers must contain established terminology from the style drift literature (e.g., "style drift", "active share", "closet indexing") to be classified as relevant.
Phase 1: 58-Corpus Phase 2: 53-Corpus Phase 3: Merge
| | |
v v v
+-------------+ +-------------+ +-------------+
| 19 OpenAlex | | Broad Search| | Load Both |
| Queries | | Top Journals| | Corpora |
+-------------+ +-------------+ +-------------+
| | |
v v v
+-------------+ +-------------+ +-------------+
| 5400+ Raw | | 2600+ Raw | | Deduplicate |
| Papers | | Papers | | by DOI |
+-------------+ +-------------+ +-------------+
| | |
v v v
+-------------+ +-------------+ +-------------+
| Stage 1-2 | | Journal + | | 111 Unique |
| Filters | | Citation | | Papers |
+-------------+ +-------------+ +-------------+
| | |
v v v
+-------------+ +-------------+ +-------------+
| Stage 3-4 | | Weak Kw | | Keyword |
| Journals+Kw | | Filter | | Classify |
+-------------+ +-------------+ +-------------+
| | |
v v v
+-------------+ +-------------+ +-------------+
| 58 Papers | | 53 Papers | | 65 RELEVANT|
| (100%) | | (13.2%) | | 46 REVIEW |
+-------------+ +-------------+ +-------------+
| Requirement | Value |
|---|---|
| Python Version | 3.8+ |
| Required Packages | requests, matplotlib |
| Optional Packages | pandas (for CSV manipulation), numpy (for statistics) |
| API | Endpoint | Rate Limit | Auth |
|---|---|---|---|
| OpenAlex | https://api.openalex.org |
10 requests/second with polite pool (mailto header) | None required, but mailto header recommended |
| Semantic Scholar | https://api.semanticscholar.org/graph/v1 |
1 request/3 seconds without API key | Optional API key for higher limits |
mailto header in OpenAlex requests to access the polite pool (10 req/sec). Without it, rate limits are lower.
Location: literature/sections/introduction/
Execute 19 targeted queries against OpenAlex API
Organized into 5 tiers by specificity:
"mutual fund style drift""fund style misclassification""style consistency" AND "mutual fund""investment style" AND "fund" AND "deviation""closet indexing""active share" AND "mutual fund""benchmark mismatch" AND "fund""style timing" AND "fund""window dressing" AND "mutual fund""return-based style analysis""holdings-based style analysis""Sharpe" AND "style analysis""style box" AND "Morningstar""fund manager incentives" AND "risk""tournament" AND "mutual fund""flow-performance" AND "mutual fund""fund misrepresentation""alpha" AND "style" AND "fund""fund classification"| Filter | Value |
|---|---|
| year_range | 1990-2025 |
| type | journal-article |
| language | English |
| has_doi | True |
| has_abstract | True |
| Filter | Value |
|---|---|
| citations_min | 10 |
| or_top_journal | True |
| or_recent_with_3_cites | 2020+ with 3+ citations |
raw_results.json (5400+ papers)stage1_results.json (after relevance filters)stage2_results.json (1594 papers after quality filters)Apply finance journal and keyword relevance filters
Finance/Economics journals only
Exception: SSRN papers with 50+ citations pass
Papers must contain at least one of these terms in title or abstract:
Context-dependent: investment style - "fund" must also appear in text
Fetch full metadata from OpenAlex and Semantic Scholar
Location: literature/scripts/
Broad search filtered by publication in top finance journals:
| Script | Output |
|---|---|
01_openalex_search.py | raw_papers.csv |
02_supplementary_search.py | merged_corpus.csv |
05_relevance_filter.py | final_corpus.csv (53 papers) |
Location: literature/scripts/
| Source | File |
|---|---|
| 58-Corpus | literature/sections/introduction/openalex_output/enriched_corpus.json |
| 53-Corpus | literature/data/final_corpus.csv |
Deduplication: DOI-based matching (case-insensitive)
Result: 111 unique papers (0 duplicates found)
Apply Stage 4 keywords to all merged papers
def classify_paper(title: str, abstract: str) -> str:
text = (title + " " + abstract).lower()
# Check direct keywords
for keyword in RELEVANCE_KEYWORDS:
if keyword in text:
return "RELEVANT"
# Check context-dependent keywords
if ("fund" in text or "mutual" in text):
for keyword in CONTEXT_KEYWORDS:
if keyword in text:
return "RELEVANT"
return "NEEDS_REVIEW"
classified_papers.json (111 papers with classification)final_relevant_corpus.json (71 papers)final_relevant_corpus.bibfinal_corpus_apa.htmlcorpus_statistics.jsoncorpus_statistics.htmlcharts/ (5 PNG visualizations)Match triggers RELEVANT classification immediately:
Only match if "fund" or "mutual" also appears:
| Keyword | Papers Matched |
|---|---|
| investment style [+fund] | 20 |
| style analysis | 12 |
| active share | 11 |
| fund style | 6 |
| window dressing | 6 |
| style drift | 5 |
| closet index/indexing | 9 |
| return-based style | 4 |
| misclassification | 3 |
| style timing | 3 |
| Source | Relevant | Total | Rate |
|---|---|---|---|
| 58-Corpus (targeted search) | 58 | 58 | 100% |
| 53-Corpus (journal prestige) | 7 | 53 | 13.2% |
These papers from the 53-corpus matched relevance keywords and were added to the final corpus:
| # | Title | Journal | Cites | Matched Keyword |
|---|---|---|---|---|
| 1 | Mutual Fund Styles | Journal of Financial Economics | 490 | fund style |
| 2 | On Mutual Fund Investment Styles | Review of Financial Studies | 422 | investment style [+fund] |
| 3 | Liquidity, Investment Style, and the Relation between Fund Size and Fund Performance | Journal of Financial and Quantitative Analysis | 332 | investment style [+fund] |
| 4 | Active Share and Mutual Fund Performance | Financial Analysts Journal | 296 | active share |
| 5 | Mutual Fund Misclassification: Evidence Based on Style Analysis | Financial Analysts Journal | 162 | style analysis, misclassification |
| 6 | Equity Style Timing | Financial Analysts Journal | 69 | style timing |
| 7 | Diseconomies of Scale in Quantitative and Fundamental Investment Styles | Journal of Financial and Quantitative Analysis | 13 | investment style [+fund] |
Execute these commands in order to fully reproduce the 65-paper corpus:
cd literature/sections/introduction/
python 15_openalex_literature_search.py
python 16_corpus_reduction.py
python 17_enrich_corpus_metadata.py
python 18_generate_apa_html.py
cd literature/scripts/
python 01_openalex_search.py
python 02_supplementary_search.py
python 05_relevance_filter.py
cd literature/scripts/
python 21_relevance_classifier.py
python 22_final_corpus_apa_html.py
python 23_corpus_statistics.py
classification_output/final_relevant_corpus.json - 71 papersclassification_output/final_corpus_apa.html - APA citationsclassification_output/corpus_statistics.html - Statistics dashboardclassification_output/charts/ - 5 PNG visualizationsGenerated: 2025-12-28 18:38:25 | Script: 24_replication_guide.py | Mutual Fund Style Drift SLR