SNSF Grant IZCOZ0_213370
Narrative Digital Finance: a tale of structural breaks, bubbles & market narratives
Principal Investigator: Prof. Dr. Joerg Osterrieder
Bern University of Applied Sciences | University of Twente
August 2023 - August 2026
We use publicly available data from financial markets (accessed via original source URLs) and commercial databases (accessed via provider APIs). Public data is not re-stored by us. Commercial data remains on provider infrastructure per license agreements. Project-generated code, datasets, and research outputs are archived on Zenodo with DOIs.
Commercial Data Sources (Licensed):
| Source | Description | Format | Volume | Time Range |
|---|---|---|---|---|
| RavenPack | News headlines with sentiment scores | CSV, Parquet | ~3 GB | 2000-2025 |
| Deutsche Borse T7 | Nanosecond-level trading data for FESX and DAX futures | Binary, CSV | ~50 GB | Jan 2021 - Sep 2024 |
Public Data Sources:
| Source | Description | Format | URL |
|---|---|---|---|
| BIS Gigando | Central bank speeches worldwide (1996-2025) | Text, PDF | bis.org/cbspeeches/ |
| St. Louis FED FRED | Macroeconomic indicators (CPI, PPI, GDP, etc.) | CSV via API | fred.stlouisfed.org/ |
Generated Research Outputs:
| Output | Type | Format | Repository |
|---|---|---|---|
| Analysis code and methods | Software | Python | Zenodo |
| CB Speech Transcripts Dataset | Dataset | CSV | Zenodo |
| Daily Evergreen Narrative Sentiment | Dataset | CSV | Zenodo |
| Macro Regime Detection Notebooks | Software | Jupyter | Zenodo |
| Research posters and preprints | Publication | Zenodo, SSRN, arXiv |
The quality of the collected data is checked during the first working package and is an integral part of the research. We apply statistical methods to address data shortcomings. Various documents describe the dataset, its quality, and the methods used to check its consistency.
Data Collection Methods:
Quality Assurance:
Versioning: Code and databases are versioned with Git. Data releases receive DOIs via Zenodo integration.
The information on the data as well as data sources and survey processes are documented in detail.
Metadata Standards:
Documentation Provided:
No personal data or other sensitive data is used in the project. The conditions have been discussed with the data providers. No special security standards are required by the data providers for this data.
Ethical Considerations:
NLP Bias Assessment Framework:
Data Provider Agreements:
| Provider | Agreement Type | Terms |
|---|---|---|
| RavenPack | Research subscription | Academic use only |
| Deutsche Borse | Research collaboration | Project-specific usage |
Ethics Approval: Not required (non-human-subjects research per Swiss SNSF guidelines)
We use private cloud solutions. No sensitive data or personal data is collected in the project.
Access Control:
| Data Category | Access Level | Storage Location |
|---|---|---|
| Public data (FRED, BIS) | Open access | Original source URLs (not re-stored by us) |
| Commercial data | Provider access only | Provider cloud infrastructure (not redistributable) |
| Project code | Open access | GitHub and Zenodo (public) |
| Project datasets | Open access | Zenodo (CC-BY 4.0) |
Security Measures: Commercial data remains on provider infrastructure per license agreements. Relevant project code is public on GitHub and Zenodo.
The project is based on data that are largely publicly available. The raw data records may not be published without restriction.
Copyright Framework:
| Data | Copyright Holder | Our Rights | License |
|---|---|---|---|
| RavenPack news | RavenPack Inc. | Academic use only | Proprietary |
| Deutsche Borse data | Deutsche Borse AG | Research collaboration | Agreement |
| BIS speeches | BIS/Speakers | Full reuse (cite) | Public |
| Our code | Project team | Open source | MIT License |
| Our datasets | Project team | Open access | CC-BY 4.0 |
Publication Rights:
The storage capacities are large; the amount of data remains limited in the project. Relevant data is managed in the form of databases on GitHub and Zenodo and private cloud solutions. Backups and versions of the data are continuously created.
Primary Storage:
| Repository | Content | Capacity | Backup |
|---|---|---|---|
| Zenodo | Datasets, code, posters, preprints | 50 GB per record | CERN Data Centre |
| GitHub | Code, documentation | Unlimited | Git version control |
| Private cloud | Working copies | As needed | Regular backup |
Most relevant data is stored on Zenodo for long-term preservation. There is no obligation to destroy the data.
10-Year Retention Strategy:
DOI Versioning Policy:
The relevant code developed during the project together with all necessary accompanying documentation will be stored on Zenodo. Zenodo offers safe storage for all data and research outputs in CERN's Data Centre.
Published Research Outputs:
| Output | Repository | DOI | Status |
|---|---|---|---|
| World Central Banker Speech Transcripts (1996-2025) | Zenodo | 10.5281/zenodo.18034730 | Published |
| Daily Evergreen Narrative Sentiment (2004-2025) | Zenodo | 10.5281/zenodo.18036051 | Published |
| Macroeconomic Regime Detection Notebooks | Zenodo | 10.5281/zenodo.18157708 | Published |
| HFT Market Quality Poster (QuantMinds 2024) | Zenodo | 10.5281/zenodo.18167476 | Published |
| CB Communications AI Framework Poster (Freiburg 2025) | Zenodo | 10.5281/zenodo.18167572 | Published |
| HFT Impact on Market Liquidity (Preprint) | SSRN | Pending | Submitted |
| Systematic Literature Review (Financial Innovation) | arXiv | Pending | Under Review |
| Project Website | GitHub Pages | N/A | Live |
Repository URLs:
Sharing by Data Type:
| Data Type | Shareable | Location/Reason |
|---|---|---|
| Public data (FRED, BIS) | Yes | Available at original sources (not re-distributed) |
| Commercial data (RavenPack, Deutsche Borse) | No | License restrictions (remains on provider servers) |
| Project code | Yes | Zenodo and GitHub (MIT License) |
| Project datasets | Yes | Zenodo (CC-BY 4.0) |
| Research posters | Yes | Zenodo (CC-BY 4.0) |
| Preprints | Yes | SSRN, arXiv (Open Access) |
We do not use sensitive data in the project. The data come from conventional data providers and are originally collected from public sources.
Commercial Data Post-Project Access:
Yes
FAIR Data Principles Implementation:
| Principle | GitHub | Zenodo |
|---|---|---|
| F1 - Findable | Public repository, searchable | DOIs for all deposits |
| A1 - Accessible | HTTPS access | HTTPS, no authentication required |
| I1 - Interoperable | Standard Python/Jupyter formats | Standard formats (CSV, JSON, PDF) |
| R1 - Reusable | MIT License, README files | CC-BY 4.0, DataCite metadata |
Yes
Repository Operators: