Data Management Plan (DMP)

SNSF Grant IZCOZ0_213370

Narrative Digital Finance: a tale of structural breaks, bubbles & market narratives

Principal Investigator: Prof. Dr. Joerg Osterrieder

Bern University of Applied Sciences | University of Twente

August 2023 - August 2026

Contents

1 Data collection and documentation

1.1 What data will you collect, observe, generate or reuse?

We use publicly available data from financial markets (accessed via original source URLs) and commercial databases (accessed via provider APIs). Public data is not re-stored by us. Commercial data remains on provider infrastructure per license agreements. Project-generated code, datasets, and research outputs are archived on Zenodo with DOIs.

Commercial Data Sources (Licensed):

SourceDescriptionFormatVolumeTime Range
RavenPackNews headlines with sentiment scoresCSV, Parquet~3 GB2000-2025
Deutsche Borse T7Nanosecond-level trading data for FESX and DAX futuresBinary, CSV~50 GBJan 2021 - Sep 2024

Public Data Sources:

SourceDescriptionFormatURL
BIS GigandoCentral bank speeches worldwide (1996-2025)Text, PDFbis.org/cbspeeches/
St. Louis FED FREDMacroeconomic indicators (CPI, PPI, GDP, etc.)CSV via APIfred.stlouisfed.org/

Generated Research Outputs:

OutputTypeFormatRepository
Analysis code and methodsSoftwarePythonZenodo
CB Speech Transcripts DatasetDatasetCSVZenodo
Daily Evergreen Narrative SentimentDatasetCSVZenodo
Macro Regime Detection NotebooksSoftwareJupyterZenodo
Research posters and preprintsPublicationPDFZenodo, SSRN, arXiv
Note on data volume: Total data volume increased from the original estimate of ~10 GB to ~55 GB due to the addition of Deutsche Borse T7 high-frequency trading data (~50 GB) through a research collaboration established after project start.

1.2 How will the data be collected, observed or generated?

The quality of the collected data is checked during the first working package and is an integral part of the research. We apply statistical methods to address data shortcomings. Various documents describe the dataset, its quality, and the methods used to check its consistency.

Data Collection Methods:

Quality Assurance:

Versioning: Code and databases are versioned with Git. Data releases receive DOIs via Zenodo integration.

1.3 What documentation and metadata will you provide with the data?

The information on the data as well as data sources and survey processes are documented in detail.

Metadata Standards:

Documentation Provided:

2 Ethics, legal and security issues

2.1 How will ethical issues be addressed and handled?

No personal data or other sensitive data is used in the project. The conditions have been discussed with the data providers. No special security standards are required by the data providers for this data.

Ethical Considerations:

NLP Bias Assessment Framework:

Data Provider Agreements:

ProviderAgreement TypeTerms
RavenPackResearch subscriptionAcademic use only
Deutsche BorseResearch collaborationProject-specific usage

Ethics Approval: Not required (non-human-subjects research per Swiss SNSF guidelines)

2.2 How will data access and security be managed?

We use private cloud solutions. No sensitive data or personal data is collected in the project.

Access Control:

Data CategoryAccess LevelStorage Location
Public data (FRED, BIS)Open accessOriginal source URLs (not re-stored by us)
Commercial dataProvider access onlyProvider cloud infrastructure (not redistributable)
Project codeOpen accessGitHub and Zenodo (public)
Project datasetsOpen accessZenodo (CC-BY 4.0)

Security Measures: Commercial data remains on provider infrastructure per license agreements. Relevant project code is public on GitHub and Zenodo.

2.3 How will you handle copyright and Intellectual Property Rights issues?

The project is based on data that are largely publicly available. The raw data records may not be published without restriction.

Copyright Framework:

DataCopyright HolderOur RightsLicense
RavenPack newsRavenPack Inc.Academic use onlyProprietary
Deutsche Borse dataDeutsche Borse AGResearch collaborationAgreement
BIS speechesBIS/SpeakersFull reuse (cite)Public
Our codeProject teamOpen sourceMIT License
Our datasetsProject teamOpen accessCC-BY 4.0

Publication Rights:

3 Data storage and preservation

3.1 How will your data be stored and backed-up during the research?

The storage capacities are large; the amount of data remains limited in the project. Relevant data is managed in the form of databases on GitHub and Zenodo and private cloud solutions. Backups and versions of the data are continuously created.

Primary Storage:

RepositoryContentCapacityBackup
ZenodoDatasets, code, posters, preprints50 GB per recordCERN Data Centre
GitHubCode, documentationUnlimitedGit version control
Private cloudWorking copiesAs neededRegular backup

3.2 What is your data preservation plan?

Most relevant data is stored on Zenodo for long-term preservation. There is no obligation to destroy the data.

10-Year Retention Strategy:

DOI Versioning Policy:

Deutsche Borse T7 Data: Due to the research collaboration agreement, raw HFT data cannot be archived publicly. Analysis methodology and derived features are documented to enable reproducibility.

4 Data sharing and reuse

4.1 How and where will the data be shared?

The relevant code developed during the project together with all necessary accompanying documentation will be stored on Zenodo. Zenodo offers safe storage for all data and research outputs in CERN's Data Centre.

Published Research Outputs:

OutputRepositoryDOIStatus
World Central Banker Speech Transcripts (1996-2025)Zenodo10.5281/zenodo.18034730Published
Daily Evergreen Narrative Sentiment (2004-2025)Zenodo10.5281/zenodo.18036051Published
Macroeconomic Regime Detection NotebooksZenodo10.5281/zenodo.18157708Published
HFT Market Quality Poster (QuantMinds 2024)Zenodo10.5281/zenodo.18167476Published
CB Communications AI Framework Poster (Freiburg 2025)Zenodo10.5281/zenodo.18167572Published
HFT Impact on Market Liquidity (Preprint)SSRNPendingSubmitted
Systematic Literature Review (Financial Innovation)arXivPendingUnder Review
Project WebsiteGitHub PagesN/ALive

Repository URLs:

Sharing by Data Type:

Data TypeShareableLocation/Reason
Public data (FRED, BIS)YesAvailable at original sources (not re-distributed)
Commercial data (RavenPack, Deutsche Borse)NoLicense restrictions (remains on provider servers)
Project codeYesZenodo and GitHub (MIT License)
Project datasetsYesZenodo (CC-BY 4.0)
Research postersYesZenodo (CC-BY 4.0)
PreprintsYesSSRN, arXiv (Open Access)

4.2 Are there any necessary limitations to protect sensitive data?

We do not use sensitive data in the project. The data come from conventional data providers and are originally collected from public sources.

Commercial Data Post-Project Access:

4.3 All digital repositories I will choose are conform to the FAIR Data Principles.

Yes

FAIR Data Principles Implementation:

PrincipleGitHubZenodo
F1 - FindablePublic repository, searchableDOIs for all deposits
A1 - AccessibleHTTPS accessHTTPS, no authentication required
I1 - InteroperableStandard Python/Jupyter formatsStandard formats (CSV, JSON, PDF)
R1 - ReusableMIT License, README filesCC-BY 4.0, DataCite metadata

4.4 I will choose digital repositories maintained by a non-profit organisation.

Yes

Repository Operators:

Back to top