DMP Comparison: Original vs Enhanced
SNSF Grant IZCOZ0_213370 | Data Management Plan Review
We will use publicly available data on financial markets as well data from databases which we buy, e.g. from Refinitiv. The data will be in csv format, about 10 GB, the data type are financial data, stored as integers and characters.
| Source | Format | Volume |
|---|---|---|
| RavenPack | CSV, Parquet | ~3 GB |
| Deutsche Borse T7 | Binary, CSV | ~50 GB |
| Source | Format | URL |
|---|---|---|
| BIS Gigando | Text, PDF | bis.org/cbspeeches/ |
| St. Louis FED FRED | CSV via API | fred.stlouisfed.org/ |
- Analysis code (Python) - Zenodo
- CB Speech Transcripts - Zenodo DOI
- Daily Evergreen Sentiment - Zenodo DOI
- Macro Regime Notebooks - Zenodo DOI
- Research posters/preprints - Zenodo/SSRN/arXiv
Quality assurance
- The data are checked for consistency and completeness using statistical methods.
Versioning
- The code and the database are versioned with the help of the ZHAW internal tools.
The quality of the collected data will be checked during the first working package and is an integral part of the research. The consistency as well. In addition, due to shortcomings of the data, we will apply statistical methods to overcome this. We will have various documents describing the dataset, its quality and the methods used to check its consistency.
- API Access: FRED API, BIS API for central bank speeches
- Commercial: RavenPack SQL, Deutsche Borse SFTP
- NLP: HuggingFace transformers, FinBERT, BERTopic, NER tagging
- LLM: OpenAI gpt-4o-mini for narrative sentiment
Statistical consistency checks, change point detection.
Git version control. Data releases receive DOIs via Zenodo.
The information on the data as well as data sources and survey processes are documented in detail. The information on the project and the data will be made available to our university employees so that further projects can be developed in this area.
- Zenodo metadata (DataCite schema)
- README documentation with variable descriptions
- Data dictionaries
- README.md files in all repositories
- Jupyter notebooks with methodology
- CC-BY 4.0 licensing for all outputs
No personal data or other sensitive data is used in the project. In this respect, the university internal security standards are applied.
The conditions have already been discussed with the data providers. No special security standards are required by the data providers for this data.
All financial data is aggregated market data with no individual identification. Central bank speeches are official public communications.
- Sentiment distribution analysis by sector/geography
- Temporal drift monitoring
- English-language dominance acknowledged
| Provider | Terms |
|---|---|
| RavenPack | Research subscription, academic use only |
| Deutsche Borse | Research collaboration, project-specific |
Access to the data is only granted to team members. The university IT service guarantees the security of data and processes. No sensitive data or personal data is collected in the project.
We use private cloud solutions. No sensitive data or personal data is collected.
| Data Category | Access | Storage |
|---|---|---|
| Public data (FRED, BIS) | Open | Original sources |
| Commercial data | Provider only | Provider infrastructure |
| Project code/datasets | Open | GitHub/Zenodo |
The project is based on data that are largely publicly available. The raw data records may not be published without restriction.
| Data | Copyright | License |
|---|---|---|
| Commercial data | Providers | Proprietary |
| BIS data | Public | Public domain |
| Our code | Project team | MIT License |
| Our datasets | Project team | CC-BY 4.0 |
The storage capacities are very large, the amount of data remains very limited in the project.
The data is managed in the form of databases on the ZHAW internal Github. Backups and versions of the data are continuously created.
Relevant data is managed on GitHub and Zenodo and private cloud solutions.
| Repository | Content | Backup |
|---|---|---|
| Zenodo | Datasets, code, posters | CERN Data Centre |
| GitHub | Code, docs | Git version control |
| Private cloud | Working copies | Regular backup |
The data is stored on the ZHAW internal github for a long time and managed by the ZHAW using the existing tools. There is no obligation to destroy the data.
Most relevant data is stored on Zenodo for long-term preservation.
- All outputs archived on Zenodo with DOIs
- Zenodo at CERN with 10+ year retention (SNSF compliant)
- Public data at original sources
- Concept DOI + Version DOI per Zenodo
- CHANGELOG.md in each deposit
The code developed during the project together with all necessary accompanying documentation will be stored on a GitHub channel. On the other hand, the data archiving will be done through Zenodo. Zenodo offers safe storage for all data and research outputs in CERN's Data Centre and it provides easy integration with GitHub.
The relevant code developed during the project will be stored on Zenodo.
| Output | Repository | Status |
|---|---|---|
| CB Speech Transcripts | Zenodo | Published |
| Evergreen Narrative Sentiment | Zenodo | Published |
| Macro Regime Detection | Zenodo | Published |
| HFT Poster (QuantMinds) | Zenodo | Published |
| CB AI Framework Poster | Zenodo | Published |
| HFT Preprint | SSRN | Pending |
| SLR Preprint | arXiv | Under Review |
| Data | Shareable | Location |
|---|---|---|
| Public data | Yes | Original sources |
| Commercial data | No | Provider servers |
| Project code | Yes | Zenodo/GitHub (MIT) |
| Project datasets | Yes | Zenodo (CC-BY 4.0) |
We do not use sensitive data in the project. The data come from conventional data providers and are originally collected from public sources.
- Agreements are project-duration specific
- Methodology documented for reproducibility
- Derived features archived
Yes
Answer: Yes
| Principle | GitHub | Zenodo |
|---|---|---|
| F1 - Findable | Public, searchable | DOIs for all deposits |
| A1 - Accessible | HTTPS access | HTTPS, no auth |
| I1 - Interoperable | Python/Jupyter | CSV, JSON, PDF |
| R1 - Reusable | MIT License | CC-BY 4.0, DataCite |
Yes
Answer: Yes
- Zenodo: CERN (non-profit)
- arXiv: Cornell University (non-profit)
- GitHub: Microsoft (commercial, free for public)