bank-contacts
FINMA-regulated banks and asset managers contact database for academic research
Information
| Property | Value |
|---|---|
| Language | Python |
| Stars | 0 |
| Forks | 0 |
| Watchers | 0 |
| Open Issues | 0 |
| License | No License |
| Created | 2026-01-20 |
| Last Updated | 2026-01-21 |
| Last Push | 2026-01-21 |
| Contributors | 1 |
| Default Branch | master |
| Visibility | private |
Datasets
This repository includes 11 dataset(s):
| Dataset | Format | Size |
|---|---|---|
| data | | 0.0 KB |
| asset_managers.json | .json | 458.82 KB |
| banks.json | .json | 62.31 KB |
| banks_scraped.json | .json | 41.46 KB |
| combined.csv | .csv | 208.56 KB |
| combined.json | .json | 522.19 KB |
| combined_final.json | .json | 532.89 KB |
| final.json | .json | 1079.72 KB |
| fintech.json | .json | 1.06 KB |
| scraping_report.html | .html | 2.63 KB |
| data.json | .json | 1079.72 KB |
Reproducibility
This repository includes reproducibility tools:
- Python requirements.txt
Status
- Issues: Enabled
- Wiki: Disabled
- Pages: Enabled
README
FINMA Regulated Entities Contact Database
A comprehensive database of FINMA-regulated Swiss financial institutions with executive contacts, board members, and quantitative role identification for academic research outreach.
Features
- 2,171 institutions (270 banks, 1,897 asset managers, 4 fintech)
- Executive contacts from ZEFIX, company websites, and annual reports
- Quantitative role identification (Head of Quant, Data Science, Risk Analytics)
- Email inference with validation using DNS/SMTP verification
- Stealth scraping with anti-detection measures
Data Sources
| Source | Description | URL |
|---|---|---|
| FINMA | Official list of authorized institutions | finma.ch |
| ZEFIX | Swiss Commercial Register (board members) | zefix.admin.ch |
| Websites | Company team/management pages | Deep scraping |
| PDFs | Annual report org charts | Text extraction |
Data Fields
| Field | Availability | Source |
|---|---|---|
| Institution Name | 100% | FINMA |
| City/Canton | 100% | FINMA |
| License Type | 100% | FINMA |
| Website | ~90% | Scraped |
| Board Members | ~92% | ZEFIX |
| Executives | ~69% | Website/PDF |
| Quant Contacts | ~14-23% | Website/PDF |
| High-Confidence Emails | ~37-55% | Inferred+Validated |
Quick Start
Prerequisites
# Create Python 3.11 environment (required for undetected-chromedriver)
conda create -n selenium_scraper python=3.11
conda activate selenium_scraper
# Install dependencies
pip install -r requirements.txt
Run Full Pipeline
This runs all stages: 1. ZEFIX scraping (~6 hours) 2. Website deep scraping (~12 hours) 3. PDF extraction (~4 hours) 4. Email validation (~2 hours)
Run Individual Stages
# Stage 1: ZEFIX (board members)
python zefix_scraper.py --input combined.json --output zefix_enriched.json
# Stage 2: Website scraping (executives, quant roles)
python website_deep_scraper.py --input zefix_enriched.json --output website_enriched.json
# Stage 3: PDF extraction (annual reports)
python pdf_extractor.py --input website_enriched.json --output pdf_enriched.json
# Stage 4: Email validation
python email_validator.py --input pdf_enriched.json --output final.json
# Generate exports
python export_quants.py --input final.json
python generate_site.py
Test with Small Sample
Output Files
| File | Description |
|---|---|
data/final.json |
Complete enriched data |
data/quant_contacts.csv |
Quantitative role contacts for outreach |
data/quant_contacts_high_conf.csv |
High-confidence emails only |
data/scraping_report.html |
Visual progress report |
docs/index.html |
Interactive GitHub Pages site |
Pipeline Architecture
combined.json
|
v
[ZEFIX Scraper] --> zefix_enriched.json
| (board members, UIDs)
v
[Website Scraper] --> website_enriched.json
| (executives, quant roles, LinkedIn)
v
[PDF Extractor] --> pdf_enriched.json
| (annual report extraction)
v
[Email Validator] --> final.json
| (DNS/SMTP validation)
v
[Exports]
├── quant_contacts.csv
├── scraping_report.html
└── GitHub Pages site
Script Descriptions
| Script | Purpose |
|---|---|
selenium_core.py |
Stealth browser driver with anti-detection |
zefix_scraper.py |
ZEFIX web interface scraper (bypasses API) |
website_deep_scraper.py |
Deep website scraping for team pages |
pdf_extractor.py |
Annual report org chart extraction |
email_validator.py |
DNS/SMTP email validation |
scraping_orchestrator.py |
Pipeline coordinator |
export_quants.py |
CSV export for quant contacts |
generate_site.py |
GitHub Pages site generator |
Configuration
Edit config/scraping_config.yaml:
zefix:
min_delay: 3.0 # Seconds between requests
max_delay: 6.0
max_requests_per_session: 25
website:
min_delay: 2.0
max_delay: 5.0
max_pages_per_site: 5
general:
headless: true # Set false to see browser
checkpoint_interval: 10
Checkpoint Recovery
The pipeline saves progress every 10 institutions. If interrupted:
# Resume from checkpoint (default)
python scraping_orchestrator.py
# Start fresh (ignore checkpoint)
python scraping_orchestrator.py --no-resume
Checkpoints stored in: data/checkpoints/
Quantitative Role Keywords
The system searches for these roles (EN/DE/FR):
- Head of Quantitative Research
- Quant Analyst / Researcher
- Chief Data Officer / Head of Data Science
- Head of Risk Analytics / Risk Modeling
- Quantitative Strategist
- Machine Learning / AI Lead
Rate Limiting
To avoid detection and blocking:
- ZEFIX: 1 request per 3-6 seconds, max 25/session
- Websites: 1 request per 2-5 seconds, max 40/session
- PDF downloads: 1 per second
- Total runtime: ~24 hours for full scrape
View Results
Online
Visit GitHub Pages site
Locally
Command Line
# Check ZEFIX coverage
python -c "import json; d=json.load(open('data/final.json')); print(f'Board members: {sum(1 for i in d if i.get(\"board_members\"))}/{len(d)}')"
# Check quant contacts
python -c "import json; d=json.load(open('data/final.json')); q=sum(len(i.get('quant_contacts',[])) for i in d); print(f'Quant contacts: {q}')"
Data Protection
- Only publicly available information is collected
- Data sourced from official Swiss government registers and public company websites
- No personal data beyond publicly listed executive roles
- For data removal requests, please open an issue
License
Data is sourced from public Swiss government registers. This repository is for academic research purposes.
Disclaimer
This database is provided for academic research purposes only. While we strive for accuracy: - Executive data may change as people move between roles - Email addresses are inferred and may not be accurate - Always verify contact information before use
Last updated: January 2026