Skip to main content

Data Management: Describe Your Data

Metadata

Definition

Metadata is information about the context, content, quality, provenance, and/or accessibility of a set of data.  In order for your data to be accessible to you, your colleagues, and other researchers, it must be properly documented.  Put simply:

Metadata is:

  • Frequently required for depositing data set in disciplinary repositories, or for publishing in a research journal;
  • Necessary for the longevity and reproducibility of research data;
  • Useful for analyzing the data in data files.

Formats

Metadata can exist in a variety of formats. Some of the most common include:

Type of Metadata Example Further Notes
A Text or HTML Document You can also use a text document to create a data dictionary.  A data dictionary simply records information about the various metadata elements, sub-elements and attributes and provides sample content. It is a particularly good way to record what type of metadata standard you are using, and if there is any variation from the standard.
An XML document either linked to the data files, or embeded within it If you are using XML, it probably is following an established metadata standard.  For instance, the sample content to the left is an example of Dublin Core. In this case, the XML tags, such as <dc:title>, corespond to a set of defined Dublin Core elements. Dublin Core is one of the most common metadata standards, and may meet most of your metadata needs. 

 

Metadata Standards

Metadata standards specify what pieces of information are included and how they are expressed in digital files. Some are generic enough to be useful across a wide array of disciplines, while others are highly specific to disciplinary areas.

Various metadata standards are available for researchers to choose from.  Below is a table of some of the most common.

Discipline Metadata Standard Description
General

Dublin Core

Altova Schema Library

Widely used in disciplinary and institutional repositories.

A reference library to common (and uncommon) industry and cross-industry schemas.

Life Sciences

Darwin Core

Ecology Metadata Language (EML)

Designed to facilitate the sharing of information about biological diversity. It is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples and related information.

Maintained by the Ecological Society of America. Consists of XML modules that can be used to document ecological datasets.

Humanities

Seeing Standards: A Visualization of the Metadata Universe

Text Encoding Initiative

Information on 105 cultural heritage metadata standards.

A widely-used standard for representing textual materials in XML.

Social Sciences DDI A metadata specification for the social and behavioral sciences created by the Data Documentation Initiative. Used to document data through its lifecycle and to enhance dataset interoperability.

 

 Additional Information

The Digital Library Federation has a wiki for best practices for shareable metadata. It includes a general recommendations for best practices, recommendations for classes of data elements, and best practices for technical aspects of metadata.

An NSF funded project, DataONEpedia, is a database of best practices for data management.  The goals are to provide a place where data can be collected, managed and updated by appropriate individuals, and to ensure that collected data can be reused and potentially presented in multiple ways.


Abridged from Tufts University Data Management Research Guide - Documentation & Metadata page.