DMP Comparison: Original vs Enhanced

SNSF Grant IZCOZ0_213370 | Data Management Plan Review

Original (from PDF)
Enhanced (SNSF-compliant)
New content added
Improved detail
1. Data collection and documentation
1.1 What data will you collect, observe, generate or reuse?
Original

We will use publicly available data on financial markets as well data from databases which we buy, e.g. from Refinitiv. The data will be in csv format, about 10 GB, the data type are financial data, stored as integers and characters.

Enhanced
Commercial Data Sources (Licensed):
SourceFormatVolume
RavenPackCSV, Parquet~3 GB
Deutsche Borse T7Binary, CSV~50 GB
Public Data Sources:
SourceFormatURL
BIS GigandoText, PDFbis.org/cbspeeches/
St. Louis FED FREDCSV via APIfred.stlouisfed.org/
Generated Research Outputs:
  • Analysis code (Python) - Zenodo
  • CB Speech Transcripts - Zenodo DOI
  • Daily Evergreen Sentiment - Zenodo DOI
  • Macro Regime Notebooks - Zenodo DOI
  • Research posters/preprints - Zenodo/SSRN/arXiv
1.2 How will the data be collected, observed or generated?
Original

Quality assurance

- The data are checked for consistency and completeness using statistical methods.

Versioning

- The code and the database are versioned with the help of the ZHAW internal tools.

The quality of the collected data will be checked during the first working package and is an integral part of the research. The consistency as well. In addition, due to shortcomings of the data, we will apply statistical methods to overcome this. We will have various documents describing the dataset, its quality and the methods used to check its consistency.

Enhanced
Data Collection Methods:
  • API Access: FRED API, BIS API for central bank speeches
  • Commercial: RavenPack SQL, Deutsche Borse SFTP
  • NLP: HuggingFace transformers, FinBERT, BERTopic, NER tagging
  • LLM: OpenAI gpt-4o-mini for narrative sentiment
Quality Assurance:

Statistical consistency checks, change point detection.

Versioning:

Git version control. Data releases receive DOIs via Zenodo.

1.3 What documentation and metadata will you provide with the data?
Original

The information on the data as well as data sources and survey processes are documented in detail. The information on the project and the data will be made available to our university employees so that further projects can be developed in this area.

Enhanced
Metadata Standards:
  • Zenodo metadata (DataCite schema)
  • README documentation with variable descriptions
  • Data dictionaries
Documentation Provided:
  • README.md files in all repositories
  • Jupyter notebooks with methodology
  • CC-BY 4.0 licensing for all outputs
2. Ethics, legal and security issues
2.1 How will ethical issues be addressed and handled?
Original

No personal data or other sensitive data is used in the project. In this respect, the university internal security standards are applied.

The conditions have already been discussed with the data providers. No special security standards are required by the data providers for this data.

Enhanced
Ethical Considerations:

All financial data is aggregated market data with no individual identification. Central bank speeches are official public communications.

NLP Bias Assessment Framework:
  • Sentiment distribution analysis by sector/geography
  • Temporal drift monitoring
  • English-language dominance acknowledged
Data Provider Agreements:
ProviderTerms
RavenPackResearch subscription, academic use only
Deutsche BorseResearch collaboration, project-specific
2.2 How will data access and security be managed?
Original

Access to the data is only granted to team members. The university IT service guarantees the security of data and processes. No sensitive data or personal data is collected in the project.

Enhanced

We use private cloud solutions. No sensitive data or personal data is collected.

Access Control Matrix:
Data CategoryAccessStorage
Public data (FRED, BIS)OpenOriginal sources
Commercial dataProvider onlyProvider infrastructure
Project code/datasetsOpenGitHub/Zenodo
2.3 How will you handle copyright and Intellectual Property Rights issues?
Original

The project is based on data that are largely publicly available. The raw data records may not be published without restriction.

Enhanced
Copyright Framework:
DataCopyrightLicense
Commercial dataProvidersProprietary
BIS dataPublicPublic domain
Our codeProject teamMIT License
Our datasetsProject teamCC-BY 4.0
3. Data storage and preservation
3.1 How will your data be stored and backed-up during the research?
Original

The storage capacities are very large, the amount of data remains very limited in the project.

The data is managed in the form of databases on the ZHAW internal Github. Backups and versions of the data are continuously created.

Enhanced

Relevant data is managed on GitHub and Zenodo and private cloud solutions.

Primary Storage:
RepositoryContentBackup
ZenodoDatasets, code, postersCERN Data Centre
GitHubCode, docsGit version control
Private cloudWorking copiesRegular backup
3.2 What is your data preservation plan?
Original

The data is stored on the ZHAW internal github for a long time and managed by the ZHAW using the existing tools. There is no obligation to destroy the data.

Enhanced

Most relevant data is stored on Zenodo for long-term preservation.

10-Year Retention Strategy:
  • All outputs archived on Zenodo with DOIs
  • Zenodo at CERN with 10+ year retention (SNSF compliant)
  • Public data at original sources
DOI Versioning Policy:
  • Concept DOI + Version DOI per Zenodo
  • CHANGELOG.md in each deposit
Deutsche Borse Note: Raw HFT data cannot be archived publicly. Methodology documented for reproducibility.
Compliance: Quarterly self-audit, PI responsible.
4. Data sharing and reuse
4.1 How and where will the data be shared?
Original

The code developed during the project together with all necessary accompanying documentation will be stored on a GitHub channel. On the other hand, the data archiving will be done through Zenodo. Zenodo offers safe storage for all data and research outputs in CERN's Data Centre and it provides easy integration with GitHub.

Enhanced

The relevant code developed during the project will be stored on Zenodo.

Published Research Outputs (7 items):
OutputRepositoryStatus
CB Speech TranscriptsZenodoPublished
Evergreen Narrative SentimentZenodoPublished
Macro Regime DetectionZenodoPublished
HFT Poster (QuantMinds)ZenodoPublished
CB AI Framework PosterZenodoPublished
HFT PreprintSSRNPending
SLR PreprintarXivUnder Review
Sharing by Data Type:
DataShareableLocation
Public dataYesOriginal sources
Commercial dataNoProvider servers
Project codeYesZenodo/GitHub (MIT)
Project datasetsYesZenodo (CC-BY 4.0)
4.2 Are there any necessary limitations to protect sensitive data?
Original

We do not use sensitive data in the project. The data come from conventional data providers and are originally collected from public sources.

Enhanced
Commercial Data Post-Project:
  • Agreements are project-duration specific
  • Methodology documented for reproducibility
  • Derived features archived
4.3 All digital repositories I will choose are conform to the FAIR Data Principles.
Original

Yes

Enhanced

Answer: Yes

FAIR Compliance Matrix:
PrincipleGitHubZenodo
F1 - FindablePublic, searchableDOIs for all deposits
A1 - AccessibleHTTPS accessHTTPS, no auth
I1 - InteroperablePython/JupyterCSV, JSON, PDF
R1 - ReusableMIT LicenseCC-BY 4.0, DataCite
4.4 I will choose digital repositories maintained by a non-profit organisation.
Original

Yes

Enhanced

Answer: Yes

Repository Operators:
  • Zenodo: CERN (non-profit)
  • arXiv: Cornell University (non-profit)
  • GitHub: Microsoft (commercial, free for public)
Summary of Enhancements
7
Research Outputs with DOIs
5
Zenodo Deposits
55 GB
Total Data Volume
8
Gap Fixes Applied
10+
Year Retention (Zenodo)
100%
SNSF Compliant

View Full Enhanced DMP