Data Reuse

On this page

Benefits and drawbacks

Making data reusable benefits researchers who publish their data, researchers who reuse data, and society.

Researchers who publish their data see an increase in their scientific reputation, citations and collaborations (Rehwald et al., 2022; Pauls et al., 2023). In addition, researchers who publish their data not only comply with the FAIR Data Principles, but also avoid bias in the body of evidence (on Strategies for Responsible Sharing of Clinical Trial Data; Board on Health Sciences Policy; Institute of Medicine, 2015), increase transparency and thus trust in research (Engelhardt et al., 2022; Rehwald et al., 2022; Pauls et al., 2023). Finally, by sharing their resources and perspectives, researchers who publish their data enable other researchers to build on their work, accelerating scientific discovery (Engelhardt et al., 2022; on Strategies for Responsible Sharing of Clinical Trial Data; Board on Health Sciences Policy; Institute of Medicine, 2015; Rehwald et al., 2022).

Researchers can recycle unique data by performing secondary analyses to answer new research questions and/or with new methods (Rehwald et al., 2022; Pauls et al., 2023). Reusing data in this way saves resources such as time, energy and money (Engelhardt et al., 2022; Why Are Research Data Managed and Reused?, n.d.; Rehwald et al., 2022; Pauls et al., 2023). Data reuse also increases collaboration and, over time, enables the comparison of different samples (Rehwald et al., 2022; Pauls et al., 2023). Indeed, data reuse is essential for interdisciplinary experiments and cross-cutting research approaches (Data Reuse Stories. Some Concrete Cases Involving Several Institutions and Consortia in Europe, n.d.).

Making data reusable can also benefit society. It reduces unnecessary experimentation (Rehwald et al., 2022), avoids duplication of data collection and minimises collection from hard-to-reach, vulnerable or over-researched populations (Why Are Research Data Managed and Reused?, n.d.; Rehwald et al., 2022). It also enables replication and thus promotes reproducibility. Finally, it benefits teaching and improves the link between academia and industry (Rehwald et al., 2022).

As suggested by Sielemann et al. 2020 (Sielemann et al., 2020), there are also challenges, limitations and risks associated with data reuse.

For researchers who publish their data, preparing datasets for reuse is time-consuming.

For researchers reusing data, there are risks such as unknown quality and denormalisation (i.e. “the same data is stored multiple times in the same database under different names/identifiers”). There is also the challenge of comparing and integrating datasets from different sources (Sielemann et al., 2020).

Resources to facilitate data re-use in microbiology

Below are listed widely used resources in microbiology which facilitate the re-use of raw data found in the data repositories (see section above). These so-called “secondary databases” provided added value through additional data types for example from data integration or from processing of raw data. For each resource and when available, the FAIRsharing and re3data pages are linked. On the FAIRsharing page, you will find information such as which journals endorse the resource (under “Collections & Recommendations” and then “In Policies”). On the re3data page, you will find information such as the above-mentioned criteria to select a trusted resource. DB = database.

Domain, Data Type	Data repository	FAIRsharing	re3data
Viruses, Knowledge resources	ViralZone	FAIRsharing	re3data
	International Committee for the Taxonomy of Viruses ICTV	-	-
Viruses, Virus-host databases	Virus-HostDB	-	-
	Viral Host-Range DB VHRDB	FAIRsharing	-
Viruses, Sequence analysis platforms	NCBI Virus	FAIRsharing	-
	(BV-BRC)	FAIRsharing	re3data
Viruses, Nucleic acid sequence downloads	RVDB	-	-
	(inphared)	-	-
Viruses, Macromolecular structures	VIPERdb	FAIRsharing	re3data
Viruses, Protein sequences	Virus Orthologous Groups (VOGdb)	-	-
	Phage Orthologous Groups (PHROGs)	-	-
Viruses, -omics datasets	IMG/VR	FAIRsharing	-
	Multi-Omics Portal of Virus Infection (MVIP)	-	-
All, Protein sequence search	InterPro	FAIRsharing	re3data

Relevant licenses and terms of use

See Licenses.

Criteria for selection trustworthy datasets

Below is a list of criteria for selecting trustworthy datasets (Bres et al., 2022; Sielemann et al., 2020). As in Sielemann et al. 2020 (Sielemann et al., 2020), for each possible criterion, several questions to consider are listed.

Integrity of the source
- Is the source/submitter associated with data fabrication/plagiarism?
- Is the way missing values handled documented?
Biases
- How was the data generated?
- Is the data generation clearly and precisely documented?
Missing metainformation (sparsity)
- Do you have all relevant information?
- Is the information understandable and consistent?
Integration of datasets from different sources
- Is the data comparable?
- Are the methods used for data generation and analysis well documented and comparable?
Quality issues
- Is the quality high enough to reach your goals?
- Are there any scores/hints available to check the quality of the dataset?
Copyright/Legal issues
- Are there any restrictions for reuse and publication of the data, especially due to the Nagoya protocol?
Further documentation
- Is the research purpose/(hypo-)thesis well documented?
- Is it documented whether the data are raw or processed?

Data discovery

Services to search for data

Registries of data repositories

Registry of Research Data Repositories (re3data.org)
OpenAIRE Explore
OpenDOAR
FAIRsharing.org
Master Data Repository List

Search engines

NCBI Datasets
Google
- Dataset Search
- Keywork + “dataset”
Library search engines
- Bielefeld Academic Search Engine (BASE)
- LIVIVO – The Search Portal for Life Sciences
Discipline-specific search engines
- Bacterial and Viral Bioinformatics Resource Center (BV-BRC)
- NFDI4Chem Search
- Study Hub NFDI4Health COVID-19
- TerrestrialMetagenomeDB
Mendeley Data

(Meta)data aggregators

Services where data can be published

Interdisciplinary and discipline-specific repositories
Data reports
Data journals (see e.g. here)

Strategies to search for data

The Consortium of European Social Science Data Archives (CESSDA) (CESSDA Data Management Expert Guide, n.d.) has produced a list of steps in data discovery. The main ones are outlined below, and you can look at their website for the sub-steps.

Develop a clear picture of the research data you need
Locate appropriate data resources
Set up a search query and search the data resource
Select data candidates
Evaluate data quality

CESSDA also suggests three steps to adjust your search strategy (CESSDA Data Management Expert Guide, n.d.):

Use appropriate words in appropriate fields
Broaden your scope
Narrow your scope

Data citation

Common standards for data citation

Interdisciplinary

DataCite 2019: Creator (PublicationYear): Title. Version. Publisher. (resourceTypeGeneral). Identifier
FORCE 11: Author(s), Year, Data set title, Data repository or archive, Version, Global persistent identifier (preferably as link)
BibGuru
DOI Citation Formatter
How to Cite Datasets and Link to Publications

For nucleic acid sequences and functional genomics

Code citation

Code citation allows for greater recognition of research software. Some major platforms and tools offer code citation: GitHub, GitLab, JabRef, Zenodo and Zotero (Code Citation Was Made Possible by Research Software Engineers in Germany and the Netherlands, n.d.).

How-tos

How to make your data reusable?

Properly document your data with metadata (Data Reuse Stories. Some Concrete Cases Involving Several Institutions and Consortia in Europe, n.d.).
Use common metadata standards and terminologies (Data Reuse Stories. Some Concrete Cases Involving Several Institutions and Consortia in Europe, n.d.).
Standardise your data.
Share your raw data with an open licence.

How to maximise already existing data?

See Wood-Charlson et al. 2022 (Wood-Charlson et al., 2022).

References

Rehwald, S., Leimer, S., Lindstädt, B., Shutsko, A., & Vandendorpe, J. (2022). Workshop on Research Data Management in Medical and Biomedical Sciences.
Pauls, C., Feeken, C., Steen, E.-E., Lindstädt, B., Vandendorpe, J., & Markus, K. (2023). Workshop on Research Data Management.
on Strategies for Responsible Sharing of Clinical Trial Data; Board on Health Sciences Policy; Institute of Medicine, C. (2015). Guiding Principles for Sharing Clinical Trial Data. In Sharing Clinical Trial Data. National Academies Press (US).
Engelhardt, C., Biernacka, K., Coffey, A., Cornet, R., Danciu, A., Demchenko, Y., Downes, S., Erdmann, C., Garbuglia, F., Germer, K., Helbig, K., Hellström, M., Hettne, K., Hibbert, D., Jetten, M., Karimova, Y., Kryger Hansen, K., Kuusniemi, M. E., Letizia, V., … Zhou, B. (2022). D7.4 How to be FAIR with your data. A teaching and training handbook for higher education institutions. https://doi.org/10.5281/ZENODO.6674301
Why are research data managed and reused? https://www.fsd.tuni.fi/en/services/data-management-guidelines/why-are-research-data-managed-and-reused/
Data Reuse Stories. Some concrete cases involving several institutions and consortia in Europe. https://www.openaire.eu/blogs/data-reuse-stories-some-concrete-cases-involving-several-institutions-and-consortia-in-europe
Sielemann, K., Hafner, A., & Pucker, B. (2020). The reuse of public datasets in the life sciences: potential risks and rewards. PeerJ. https://doi.org/10.7717/peerj.9954
Bres, E., Rudolf, D., Lindstädt, B., & Shutsko, A. (2022). Research Data Management in Medical and Biomedical Sciences.
CESSDA Data Management Expert Guide. https://dmeg.cessda.eu/
Code citation was made possible by research software engineers in Germany and the Netherlands. https://www.esciencecenter.nl/news/code-citation-was-made-possible-by-research-software-engineers-in-germany-and-the-netherlands/
Wood-Charlson, E. M., Crockett, Z., Erdmann, C., Arkin, A. P., & Robinson, C. B. (2022). Ten simple rules for getting and giving credit for data. PLOS Computational Biology, 18(9), 1–11. https://doi.org/10.1371/journal.pcbi.1010476