Definition of digital preservation
Digital preservation is the act of ensuring continued findability and access to digital material and maintaining it independently understandable and reusable by a designated community, and with evidence supporting its authenticity, for as long as necessary. Preservation actions include:
- Data cleaning
- Data validation
- Data documentation with descriptive metadata and preservation metadata (e.g. administrative and technical metadata)
- Assigning representational information (e.g. original file or derivatives)
- Ensuring acceptable data structures or file formats
Selection of data to keep
To decide what data to keep, we recommend reading this how-to guide by Angus Whyte. The steps they suggest to make this decision are as follow:
- Step 1: identify reuse purposes that the data could fulfil. It is important to note that researchers depositing data are most likely not the users of the data (neither presently, nor in the future). Data, especially in dark archives, are only used if the data is no longer available in other ways. That means the intended user is probably part of a future designated community. Accordingly, the person deciding what data to keep should take the needs of this future designated community into account as far as possible.
- Step 2: identify data that must be kept considering legal or policy compliance risks, as well as funder requirements.
- Step 3: identify data that should be kept as it may have long-term value.
- Step 4: weigh up the costs and identify any need for external advice in case of shortfall in the budget.
- Step 5: complete the data appraisal, i.e. list what data must, should or could be kept to fulfil which potential reuse purposes and summarise any actions needed to prepare the data for deposit, or justification for not keeping it.
Digitally preserving research data is usually considered the task of institutions and infrastructures. That being said, researchers can prepare their data in a way that facilitates digital preservation:
- Preparing data in a way that facilitates digital preservation
- Documenting data with metadata and context information to ensure reusability over both the short- and long-term.
- Using well-known open formats with published specifications during the project phase.
- Following the 3-2-1 rule
- Keeping 3 copies of any important file
- Storing files on 2 different media types
- Keeping at least 1 copy off site.
Recommended preservation formats for research data
For digital preservation, it is recommended saving files in the original software format and in an additional recommended file format, i.e. a format that is:
- Open rather than proprietary
- Exportable to / unpackable into an open format (e.g. xlsx, docx, etc. can be unpacked into folders of xml files)
- In widespread use
- Simple (e.g. CSV rather than xlsx)
- Text-based (i.e. any file you can open with a text editor and read) rather than binary (e.g. txt files rather than doc files)
For biomaterial data, recommended formats are CSV, TXT and XML.Edit this page