Skip to main content

Data Management Plan (DMP)

Questions to ask yourself?

Data Production

  • What type(s) of data will be produced? (experimental measures, observational or qualitative, model simulation, existing)
  • How will you capture, create, and/or process the data? (Identify instruments, software, imaging, etc. used)
  • What file format(s) will the data be saved as? Are those file formats proprietary? Will they degrade?
  • Will the data be reproducible?
  • Do you need tools or software to create/process/visualize the data?

Data Size

  • How much data will it be, and at what growth rate?
  • How often will it change?

Data Use

  • Who will potentially be using your data both now and later?

Data Organization

  • What directory and file naming convention will be used?

How to Organize Your Data

Large research projects can generate hundreds of data files. Short descriptive file names and a simple file hierarchy make these files easier to navigate and locate. Set up conventions for your project, document them for all team members, and be consistent.

Recommended conventions:

Denote dates in YYYYMMDD format

DO: Use 20140403

DON’T: Use 04032013

BECAUSE: Computers sort YYYYMMDD in chronological order.

 

Use a short unique identifier (e.g. Project Name or Grant #)

DO: CHHM

DON’T: Centre for Hip Health and Mobility

BECAUSE: Short filenames prevent the need for side scrolling and column adjustment.

 

Include a summary of content (e.g. Questionnaire or GrantProposal) as part of the file name

DO: FileNm_Guidelines_20140409_v01.docx

DON’T: FileNm_20140409.docx

BECAUSE: Files will be easier to find.

 

Use _ as delimiters. Avoid these special characters: & , * % # * ( ) ! @$ ^ ~ ‘ { } [ ] ? < > –

DO: FileNm_Guidelines_20140409_v01.docx

DON’T: FileNm Guidelines 2014 04 09 v01.docx

BECAUSE: Different computer systems handle special characters differently – filing order, etc.

 

Keep track of document versions either sequentially (e.g. v01, v02,) or with a unique date and time ( e.g. 20140403_1800)

DO: FileNm_Guidelines_20140409_v01.docx

DON’T: FileNm_Guidelines_20140409_Review.docx AND FileNm_Guidelines_20140409_Investigation.docx

BECAUSE: Two years from now, you won’t remember what you meant.


Make folder hierarchies as simple as possible

DO: F:/ Env/LIBR/DataMgmt_FileFormats_20140409_v01.docx

DON’T: F:/Environment/Library/Woodward/Data/Education/Materials/Draft/2014/04/-DataMgmt_FileFormats_20140409_v01.docx

BECAUSE: Complex folder hierarchies are harder to navigate and offer more opportunities for filing errors. System back-ups may take longer.

 

Format

A computer file format is a particular way of encoding information within a computer file so that it can be recognized by an application. File formats are indicated by the file name extension, usually a full stop followed by three letters.

Open File Formats can be used by anyone. Choose Open File Formats to:

  • increase your ability to open and read your files in the future
  • make your data accessible to more researchers immediately

Because the file specifications are publicly available, the open-source software community can ensure that data stored in these file formats remain accessible over the long term.

Proprietary File Formats

Proprietary File Formats work only with software provided by the vendor. File specifications are not freely available, so when the software is no longer supported, files in that format are typically unreadable.

Recommended File Formats

Databases: XML, CSV

E-Books: EPUB

Images: JPG, PNG, PDF, TIFF, BMP

Sound: MP3, FLAC

Text: TXT, CSV, PDF/A, ASCII, UTF-8

Video: MPG, MOV, AVI

Spreadsheets: CSV