Prediction-Small-Cap-Biotech-Anomalies
Information
| Property | Value |
|---|---|
| Language | Python |
| Stars | 1 |
| Forks | 0 |
| Watchers | 1 |
| Open Issues | 0 |
| License | No License |
| Created | 2026-03-02 |
| Last Updated | 2026-05-19 |
| Last Push | 2026-04-08 |
| Contributors | 2 |
| Default Branch | main |
| Visibility | private |
Datasets
This repository includes 43 dataset(s):
| Dataset | Format | Size |
|---|---|---|
| data | | 0.0 KB |
| AGENTS.md | .md | 0.97 KB |
| processed | | 0.0 KB |
| figures | | 0.0 KB |
| fig1_timeline.pdf | .pdf | 27.42 KB |
| fig1_timeline.png | .png | 136.48 KB |
| fig2_rebound_curves.pdf | .pdf | 27.12 KB |
| fig2_rebound_curves.png | .png | 203.22 KB |
| fig3_announced_vs_silent.pdf | .pdf | 40.06 KB |
| fig3_announced_vs_silent.png | .png | 175.43 KB |
| fig4_crl_comparison.pdf | .pdf | 20.77 KB |
| fig4_crl_comparison.png | .png | 261.6 KB |
| paper_references.bib | .bib | 8.24 KB |
| table1_sample_summary.csv | .csv | 0.46 KB |
| table2_event_study.csv | .csv | 2.8 KB |
| table2_mar.csv | .csv | 2.3 KB |
| table3_announced_vs_silent.csv | .csv | 0.4 KB |
| table4_crl_vs_p3.csv | .csv | 1.45 KB |
| table4_mar.csv | .csv | 1.51 KB |
| table5_crl_by_outcome.csv | .csv | 0.87 KB |
| table6_cross_section.csv | .csv | 3.6 KB |
| table_classification_audit.csv | .csv | 0.19 KB |
| table_event_validation.csv | .csv | 67.64 KB |
| table_silent_validation.csv | .csv | 40.17 KB |
| samples | | 0.0 KB |
| .gitkeep | | 0.0 KB |
| 01_universe.json | .json | 443.54 KB |
| 02_all_trials.json | .json | 14669.29 KB |
| 03_p3_analysis_report.json | .json | 357.03 KB |
| 13_batch_triangulation_v3_results.json | .json | 1233.79 KB |
| 17_crl_analysis_results.json | .json | 105.18 KB |
| README_samples.md | .md | 9.93 KB |
| data | | 0.0 KB |
| AGENTS.md | .md | 2.24 KB |
| init.py | .py | 0.01 KB |
| clinical_trials.py | .py | 13.45 KB |
| delisting.py | .py | 11.35 KB |
| event_dates.py | .py | 12.67 KB |
| fama_french.py | .py | 5.85 KB |
| fundamentals.py | .py | 17.17 KB |
| openalex.py | .py | 11.08 KB |
| stock_prices.py | .py | 8.91 KB |
| universe.py | .py | 5.6 KB |
Reproducibility
This repository includes reproducibility tools:
- Python requirements.txt
Status
- Issues: Enabled
- Wiki: Enabled
- Pages: Enabled
README
Algorithmic Rebound Trading in Small-Cap Biotech
MSc Thesis: Algorithm-based trading strategy identifying profitable rebound opportunities in small/mid-cap biotechs after Phase 3 clinical trial failures, systematically filtering insolvency-risk firms.
Website: https://digital-ai-finance.github.io/Prediction-Small-Cap-Biotech-Anomalies/
Overview
This repository contains the full codebase and thesis for an MSc research project that investigates whether small- and mid-cap biotech stocks ($50M--$10B market cap) exhibit exploitable abnormal returns following Phase 3 clinical trial failure announcements, and whether a composite insolvency filter can improve trading strategy performance by excluding firms at high risk of bankruptcy.
Research Questions
- Do small/mid-cap biotechs exhibit statistically significant abnormal returns after Phase 3 failure announcements?
- Can a composite insolvency filter effectively separate viable rebound candidates from insolvency-bound firms?
- Does an insolvency-filtered trading strategy generate positive risk-adjusted returns net of transaction costs?
Project Structure
src/biotech_rebound/
config.py # Global constants and parameters
data/
clinical_trials.py # ClinicalTrials.gov API v2 client
event_dates.py # SEC EDGAR 8-K event date identification
stock_prices.py # Yahoo Finance price data + delisting detection
delisting.py # Shumway (1997) delisting return adjustments
fundamentals.py # Financial metrics (yfinance + SEC EDGAR XBRL)
fama_french.py # Fama-French 3- and 5-factor loading
universe.py # Universe construction and filtering
openalex.py # OpenAlex literature search
analysis/
event_study.py # Market model, CAR, BHAR computation
statistics.py # Patell, BMP, Corrado, bootstrap, MHC
insolvency.py # Composite insolvency scoring (Z'', cash, pipeline)
spreads.py # Corwin-Schultz (2012) bid-ask spread estimator
robustness.py # Sensitivity, subsample, permutation, calendar-time
strategy/
signals.py # Entry/exit signal generation
backtest.py # Walk-forward and expanding-window backtesting
risk.py # Position sizing (equal-weight, Kelly, risk parity)
costs.py # Transaction cost model (spread + impact + commission)
visualization/
plots.py # Publication-quality matplotlib figures
tables.py # Summary and results tables
scripts/ # End-to-end pipeline scripts (00-09 + run_all.py)
tests/ # 122 unit tests (pytest)
thesis/ # Quarto chapters (.qmd) + LaTeX template
Installation
Requires Python >= 3.10. No WRDS/CRSP access needed -- all data sources are freely available.
Running the Pipeline
Individual steps can be run separately (scripts 01--09) or skipped with flags:
Data Sources
| Source | Purpose |
|---|---|
| ClinicalTrials.gov (v2 API) | Phase 3 trial identification |
| SEC EDGAR (EFTS + XBRL) | 8-K event dates, financial fundamentals |
| Yahoo Finance (yfinance) | Stock prices, market cap, delisting detection |
| Kenneth French Data Library | Fama-French 3- and 5-factor returns |
| OpenAlex | Academic literature for systematic review |
Testing
122 tests covering all modules. Linting: python -m ruff check src/ scripts/ tests/.
Thesis
The thesis is rendered with Quarto to both HTML (GitHub Pages) and PDF (LaTeX).
quarto render thesis/ --to html # Academic website
quarto render thesis/ --to pdf # Formal thesis PDF
Disclaimer
This research is for academic purposes only and does not constitute financial advice. Past performance does not guarantee future results. The trading strategy described herein is a theoretical exercise and should not be used for actual investment decisions without extensive additional validation.
License
MIT