Skip to main content Skip to docs navigation

Data Reuse

On this page

Benefits and drawbacks

Making data reusable benefits researchers who publish their data, researchers who reuse data, and society.

Researchers who publish their data see an increase in their scientific reputation, citations and collaborations (Rehwald et al., 2022; Pauls et al., 2023). In addition, researchers who publish their data not only comply with the FAIR Data Principles, but also avoid bias in the body of evidence (on Strategies for Responsible Sharing of Clinical Trial Data; Board on Health Sciences Policy; Institute of Medicine, 2015), increase transparency and thus trust in research (Engelhardt et al., 2022; Rehwald et al., 2022; Pauls et al., 2023). Finally, by sharing their resources and perspectives, researchers who publish their data enable other researchers to build on their work, accelerating scientific discovery (Engelhardt et al., 2022; on Strategies for Responsible Sharing of Clinical Trial Data; Board on Health Sciences Policy; Institute of Medicine, 2015; Rehwald et al., 2022).

Researchers can recycle unique data by performing secondary analyses to answer new research questions and/or with new methods (Rehwald et al., 2022; Pauls et al., 2023). Reusing data in this way saves resources such as time, energy and money (Engelhardt et al., 2022; Why Are Research Data Managed and Reused?, n.d.; Rehwald et al., 2022; Pauls et al., 2023). Data reuse also increases collaboration and, over time, enables the comparison of different samples (Rehwald et al., 2022; Pauls et al., 2023). Indeed, data reuse is essential for interdisciplinary experiments and cross-cutting research approaches (Data Reuse Stories. Some Concrete Cases Involving Several Institutions and Consortia in Europe, n.d.).

Making data reusable can also benefit society. It reduces unnecessary experimentation (Rehwald et al., 2022), avoids duplication of data collection and minimises collection from hard-to-reach, vulnerable or over-researched populations (Why Are Research Data Managed and Reused?, n.d.; Rehwald et al., 2022). It also enables replication and thus promotes reproducibility. Finally, it benefits teaching and improves the link between academia and industry (Rehwald et al., 2022).

As suggested by Sielemann et al. 2020 (Sielemann et al., 2020), there are also challenges, limitations and risks associated with data reuse.

For researchers who publish their data, preparing datasets for reuse is time-consuming.

For researchers reusing data, there are risks such as unknown quality and denormalisation (i.e. “the same data is stored multiple times in the same database under different names/identifiers”). There is also the challenge of comparing and integrating datasets from different sources (Sielemann et al., 2020).

Resources to facilitate data re-use in microbiology

Below are listed widely used resources in microbiology which facilitate the re-use of raw data found in the data repositories (see section above). These so-called “secondary databases” provided added value through additional data types for example from data integration or from processing of raw data. For each resource and when available, the FAIRsharing and re3data pages are linked. On the FAIRsharing page, you will find information such as which journals endorse the resource (under “Collections & Recommendations” and then “In Policies”). On the re3data page, you will find information such as the above-mentioned criteria to select a trusted resource. DB = database.

Domain, Data Type Data repository FAIRsharing re3data
Viruses, Knowledge resources ViralZone FAIRsharing re3data
  International Committee for the Taxonomy of Viruses ICTV - -
Viruses, Virus-host databases Virus-HostDB - -
  Viral Host-Range DB VHRDB FAIRsharing -
Viruses, Sequence analysis platforms NCBI Virus FAIRsharing -
  (BV-BRC) FAIRsharing re3data
Viruses, Nucleic acid sequence downloads RVDB - -
  (inphared) - -
Viruses, Macromolecular structures VIPERdb FAIRsharing re3data
Viruses, Protein sequences Virus Orthologous Groups (VOGdb) - -
  Phage Orthologous Groups (PHROGs) - -
Viruses, -omics datasets IMG/VR FAIRsharing -
  Multi-Omics Portal of Virus Infection (MVIP) - -
All, Protein sequence search InterPro FAIRsharing re3data

Relevant licenses and terms of use

See Licenses.

Criteria for selection trustworthy datasets

Below is a list of criteria for selecting trustworthy datasets (Bres et al., 2022; Sielemann et al., 2020). As in Sielemann et al. 2020 (Sielemann et al., 2020), for each possible criterion, several questions to consider are listed.

  • Integrity of the source
    • Is the source/submitter associated with data fabrication/plagiarism?
    • Is the way missing values handled documented?
  • Biases
    • How was the data generated?
    • Is the data generation clearly and precisely documented?
  • Missing metainformation (sparsity)
    • Do you have all relevant information?
    • Is the information understandable and consistent?
  • Integration of datasets from different sources
    • Is the data comparable?
    • Are the methods used for data generation and analysis well documented and comparable?
  • Quality issues
    • Is the quality high enough to reach your goals?
    • Are there any scores/hints available to check the quality of the dataset?
  • Copyright/Legal issues
    • Are there any restrictions for reuse and publication of the data, especially due to the Nagoya protocol?
  • Further documentation
    • Is the research purpose/(hypo-)thesis well documented?
    • Is it documented whether the data are raw or processed?

Data discovery

Services to search for data

Registries of data repositories

Search engines

(Meta)data aggregators

Services where data can be published

Strategies to search for data

The Consortium of European Social Science Data Archives (CESSDA) (CESSDA Data Management Expert Guide, n.d.) has produced a list of steps in data discovery. The main ones are outlined below, and you can look at their website for the sub-steps.

  1. Develop a clear picture of the research data you need
  2. Locate appropriate data resources
  3. Set up a search query and search the data resource
  4. Select data candidates
  5. Evaluate data quality

CESSDA also suggests three steps to adjust your search strategy (CESSDA Data Management Expert Guide, n.d.):

  1. Use appropriate words in appropriate fields
  2. Broaden your scope
  3. Narrow your scope

Data citation

Common standards for data citation

Interdisciplinary

For nucleic acid sequences and functional genomics

Code citation

Code citation allows for greater recognition of research software. Some major platforms and tools offer code citation: GitHub, GitLab, JabRef, Zenodo and Zotero (Code Citation Was Made Possible by Research Software Engineers in Germany and the Netherlands, n.d.).

How-tos

How to make your data reusable?

How to maximise already existing data?

See Wood-Charlson et al. 2022 (Wood-Charlson et al., 2022).

References

  1. Rehwald, S., Leimer, S., Lindstädt, B., Shutsko, A., & Vandendorpe, J. (2022). Workshop on Research Data Management in Medical and Biomedical Sciences.
  2. Pauls, C., Feeken, C., Steen, E.-E., Lindstädt, B., Vandendorpe, J., & Markus, K. (2023). Workshop on Research Data Management.
  3. on Strategies for Responsible Sharing of Clinical Trial Data; Board on Health Sciences Policy; Institute of Medicine, C. (2015). Guiding Principles for Sharing Clinical Trial Data. In Sharing Clinical Trial Data. National Academies Press (US).
  4. Engelhardt, C., Biernacka, K., Coffey, A., Cornet, R., Danciu, A., Demchenko, Y., Downes, S., Erdmann, C., Garbuglia, F., Germer, K., Helbig, K., Hellström, M., Hettne, K., Hibbert, D., Jetten, M., Karimova, Y., Kryger Hansen, K., Kuusniemi, M. E., Letizia, V., … Zhou, B. (2022). D7.4 How to be FAIR with your data. A teaching and training handbook for higher education institutions. https://doi.org/10.5281/ZENODO.6674301
  5. Why are research data managed and reused? https://www.fsd.tuni.fi/en/services/data-management-guidelines/why-are-research-data-managed-and-reused/
  6. Data Reuse Stories. Some concrete cases involving several institutions and consortia in Europe. https://www.openaire.eu/blogs/data-reuse-stories-some-concrete-cases-involving-several-institutions-and-consortia-in-europe
  7. Sielemann, K., Hafner​, A., & Pucker, B. (2020). The reuse of public datasets in the life sciences: potential risks and rewards. PeerJ. https://doi.org/10.7717/peerj.9954
  8. Bres, E., Rudolf, D., Lindstädt, B., & Shutsko, A. (2022). Research Data Management in Medical and Biomedical Sciences.
  9. CESSDA Data Management Expert Guide. https://dmeg.cessda.eu/
  10. Code citation was made possible by research software engineers in Germany and the Netherlands. https://www.esciencecenter.nl/news/code-citation-was-made-possible-by-research-software-engineers-in-germany-and-the-netherlands/
  11. Wood-Charlson, E. M., Crockett, Z., Erdmann, C., Arkin, A. P., & Robinson, C. B. (2022). Ten simple rules for getting and giving credit for data. PLOS Computational Biology, 18(9), 1–11. https://doi.org/10.1371/journal.pcbi.1010476