Skip to main content Skip to docs navigation

Data Organization

On this page

Introduction

For data organization, we suggest to use the 5S methodology that uses a list of five words (Assmann et al., 2022):

  1. Sort: delete unnecessary files.
  2. Set in order: develop and document naming conventions and folder structures.
  3. Shine:
    • Comply with conventions.
    • Develop routines.
  4. Standardize:
    • Document rules and responsibilities.
    • Develop best practices and Standard Operating Procedures (SOPs).
  5. Sustain:
    • Regularly check whether rules are followed.
    • Implement improvements if necessary.

File naming

File names should ideally allow to establish a connection to a certain experiment or data collection (Bobrov et al., 2021). Within your research group, it is recommended (Bobrov et al., 2021; Bres et al., 2022) to:

  1. Choose a file and folder naming convention.
  2. Document your convention, for instance in Standard Operating Procedures (SOPs).
  3. Make the documentation available to all research group members.
  4. Stay consistent.

Recommendations for naming conventions

If you need to choose a file and folder naming convention, it is recommended (Assmann et al., 2022; Bobrov et al., 2021; Bres et al., 2022) to include the following:

  • Favor alphabetically sortable names (e.g. starting with the date: YYYY-MM-DD).
  • Limit file names to maximum 32 characters (32CharactersLooksExactlyLikeThis.txt). Short names are easier to find and they need a shorter path, whereas long names can cause technical problems. Thus, select a name that is as short as possible and as long as necessary.
  • Favor names that reflect and are unique to the content (i.e. person, project ID/part, sample ID, experiment ID, status, data, version number and/or software name).
  • Use periods only before file extensions.
  • Do not use special characters or whitespaces which can be confusing to both machines and humans.
  • Use leading zeros when using sequential numbering:
    • For a sequence of 1-10: 01-10
    • For a sequence of 1-100: 001-010-100

Examples of file names

  • Good structure: YYYY-MM-DD_JV_ProjectID_ExperimentID with IDs being linked to a table with data documentation such as metadata (Bobrov et al., 2021).
  • Good names (Bres et al., 2022):
    • 2016-01-04_ProjectA_Ex1Test1_SmithE_v1.0.xlsx
    • 2000_USNM_379221_01_tiff
    • USNM_379221_01.tiff
  • Bad names (Bres et al., 2022):
    • Test data 2016.xlsx
    • Meeting notes Jan 17
    • Notes Eric.txt
    • Final FINAL last version.docx

Tools for simultaneous renaming of files

Multiple OS

Linux

Mac

Unix

  • mv command

Windows

File versioning

If you decide to version your files, keep the following in mind (Bres et al., 2022):

  • Decide how to version files with project partners.
  • Write down how a version change is to be defined.
  • Document version changes.

Options for file versioning include (Bres et al., 2022):

  • In file names
  • Within data (e.g. header, comment field)
  • In text files (e.g. README file)
  • Within a Version Control System (VCS) (e.g. git, Apache Subversion)

Manual file versioning

If you decide to version your files manually, it is recommended to:

  • Use a version control table.
  • Define responsibilities for completion of files.
  • Use semantic versioning: MAJOR.MINOR.PATCH (Bobrov et al., 2021; Bres et al., 2022). E.g.:
    • Ex1Test1_SmithE_v1.0.0.xlsx
    • Ex1Test1_SmithE_v1.2.5.xlsx
    • Ex1Test1_SmithE_v2.1.1.xlsx
  • Save milestone versions.
  • Store obsolete versions separately after backup.

Folder structure

Recommendations for folder structure

For a good folder structure, it is recommended to:

  • Invest time planning out folder structure.
  • Choose a folder structure that is (Bobrov et al., 2021; Bres et al., 2022):
    • Clear (i.e. self-exaplanatory, with an intuitive navigation, also for other team members)
    • Comprehensive
    • Efficient
    • Hierarchical, increasing findability
  • Have maximum (Bres et al., 2022):
    • 4 levels
    • 10 elements per folder

Example of folder structure

  • Project
    • Data
      • Raw_data
      • Processed_data
      • Documentation
    • Code
      • Src
      • Output
        • Plots
      • Documentation
    • Protocols
  • Manuscripts
  • Conference_reports
  • Administrative_information

Further resources

5S methodology

File naming

Folder structure

Data organization in spreadsheets

Tools

References

  1. Assmann, C., Gadelha, L., Markus, K., & Vandendorpe, J. (2022). Workshop on Research Data Management.
  2. Bobrov, E., Adam, L.-S., Söring, S., Jäckel, D., Herwig, A., Lindstädt, B., Vandendorpe, J., & Shutsko, A. (2021). Workshop on Research Data.
  3. Bres, E., Rudolf, D., Lindstädt, B., & Shutsko, A. (2022). Research Data Management in Medical and Biomedical Sciences.