Skip to main content Skip to docs navigation

Research Data Management (RDM)

On this page

Definition of Research Data Management (RDM)

Research Data Management (RDM) is a series of measures that need to be taken during a research project in order to (1) obtain high-quality data (whether produced or reused), (2) make data avaialbe and usable over the long-term and (3) make research findings reproducible beyond the research project (Research Data, n.d.; Bres et al., 2022; Voight et al., 2022).

Research Data Management in microbiology

Research Data Management (RDM) is crucial in microbiology to ensure the integrity and accessibility of data throughout the research process. One essential aspect of RDM is establishing clear protocols for data collection, storage, and analysis. For instance, researchers studying bacterial evolution should document their sampling procedures meticulously, including information on sampling sites, environmental conditions, and sampling techniques, to ensure reproducibility. Additionally, adopting standardized data formats, such as FASTA or GenBank, facilitates data sharing and interoperability across different studies, enhancing collaboration and knowledge exchange within the microbiology community. Proper metadata annotation is also paramount, as it provides essential context for interpreting the data. Researchers in microbiology should develop comprehensive data management plans (DMPs) outlining how data will be collected, processed, and shared throughout the research lifecycle. DMPs serve as roadmaps for RDM, ensuring that data handling procedures adhere to ethical, legal, and funder requirements. Moreover, adopting electronic lab journals (ELNs) can streamline data organization and collaboration by digitizing research notes, protocols, and experimental results. ELNs enable real-time data capture, version control, and collaboration among team members, facilitating seamless integration with RDM workflows. For example, researchers investigating microbial communities could use ELNs to record observations, generate graphs, and annotate findings collaboratively, ensuring transparency and reproducibility. Researchers working on sensitive information, such as patient data in clinical microbiology studies must take care of data security measures to safeguard this information. Embracing open science practices by depositing data in public repositories like NCBI’s GenBank or the European Nucleotide Archive fosters transparency and long-term preservation of microbiological data, ensuring its availability for future research endeavors. Therefore, microbiology researchers should integrate robust RDM practices into their workflows from the outset to maximize the impact and reproducibility of their findings while contributing to the advancement of the field.

Addtionaly, researchers in should address the management of software tools, including small analysis scripts and machine learning models, within their RDM framework. These tools are integral for processing, analyzing, and interpreting complex microbiological data sets. Therefore, documenting the software environment, version numbers, and dependencies used in data analysis workflows is crucial for ensuring reproducibility and transparency. For instance, a study investigating the taxonomic composition of gut microbiota may rely on custom Python scripts for data preprocessing and statistical analysis. By documenting these scripts along with their parameters and input data, researchers enable others to replicate their analyses and validate their findings. Moreover, utilizing version control systems like Git and hosting repositories on platforms like GitHub or GitLab ensures the traceability and accessibility of software artifacts. By incorporating software management practices into their RDM strategies, microbiology researchers can enhance the reproducibility, transparency, and rigor of their computational analyses, thereby advancing scientific knowledge in the field.

With the growing application of machine learning in microbiology, such as predicting antibiotic resistance or classifying microbial species, it becomes imperative to manage the underlying models transparently. Researchers should document model architectures, training data, and performance metrics to facilitate model validation and comparison across studies.

Research data life cycle

The research data life cycle is a model that illustrates the steps of RDM and describes how data should ideally flow through a research project to ensure successful data curation and preservation (Research Data Lifecycle, n.d.; Bobrov et al., 2021). The research data life cycle can be illustrated as follow:

Research data life cycle

Benefits of RDM

Benefits of RDM are numerous, some of them are listed below (Assmann et al., 2022; Bobrov et al., 2021; Bres et al., 2022; Engelhardt et al., 2022; Jacob et al., 2022; Lindstädt et al., 2019; Voight et al., 2022):

  • For researchers
    • Visibility
    • Reputation (ensures research quality)
    • Data ownership
    • Eligibility for funding
    • Saves time, money and resources
    • Preventing data loss
  • Additional benefits
    • Helps keep track of the project
    • Helps meet formal and legal requirements
    • Enhances teamwork and collaborations
    • Guarantees transparency, verifiability and reproducibility

Consequences of poor RDM

Consequences of poor RDM include paper retraction (e.g. González Amorós & de Puit).

Further resources

References

  1. Research Data. https://rfii.de/en/topics/#forschungsdaten
  2. Bres, E., Rudolf, D., Lindstädt, B., & Shutsko, A. (2022). Research Data Management in Medical and Biomedical Sciences.
  3. Voight, P., Frericks, S., Lindstädt, B., Shutsko, A., & Vandendorpe, J. (2022). Workshop on Research Data.
  4. Research data lifecycle. https://libguides.ntu.edu.sg/rdm/researchdatalifecycle
  5. Bobrov, E., Adam, L.-S., Söring, S., Jäckel, D., Herwig, A., Lindstädt, B., Vandendorpe, J., & Shutsko, A. (2021). Workshop on Research Data.
  6. Assmann, C., Gadelha, L., Markus, K., & Vandendorpe, J. (2022). Workshop on Research Data Management.
  7. Engelhardt, C., Biernacka, K., Coffey, A., Cornet, R., Danciu, A., Demchenko, Y., Downes, S., Erdmann, C., Garbuglia, F., Germer, K., Helbig, K., Hellström, M., Hettne, K., Hibbert, D., Jetten, M., Karimova, Y., Kryger Hansen, K., Kuusniemi, M. E., Letizia, V., … Zhou, B. (2022). D7.4 How to be FAIR with your data. A teaching and training handbook for higher education institutions. https://doi.org/10.5281/ZENODO.6674301
  8. Jacob, B., Kroehling, M. A., Mertzen, D., Straka, J., Lindstädt, B., Shutsko, A., & Vandendorpe, J. (2022). Workshop on Research Data.
  9. Lindstädt, B., Vandendorpe, J., & von der Ropp, S. (2019). Research Data Management.