Skip to Main Content

Introduction to Metadata and MetaRaider: Data Specifics & Dataset Metadata Checklist

This is a guide that will walk users through what metadata is, how to use it, and how to utilize TTU's new tool, MetaRaider, in helping them create descriptive metadata for their data.

Data and Research Data

• Data are numerical quantities or other factual attributes derived from observation, experiment or calculation.         

  - National Research Council, 1992a. "Setting priorities for space research: Opportunities and imperatives."

• Data are facts, numbers, letters, and symbols that describe an object, idea, condition, situation, or other factors. Data in a database may be characterized as predominantly word oriented (e.g., as in a text, bibliography, directory, dictionary), numeric (e.g., properties, statistics, experimental values), image(e.g., fixed or moving video, such as a film of microbes under magnification or time-lapse photography of a flower opening), or sound (e.g., a sound recording of a tornado or a fire)... Data can also be referred to as raw, processed, or verified

  - Committee for a Study on Promoting Access to Scientific and Technical Data for the Public Interest, National Research Council. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases (1999). Available at: http://www.nap.edu/openbook.php?record_id=9692&page=15

• The term "data" is used in this report to refer to any information that can be stored in digital form, including text, numbers, images, video or movies, audio, software, algorithms, equations, animations, models, simulations, etc. Such data may be generated by various means including observation, computation, or experiment. 

  - National Science Foundation (2005). Long-Lived digital data Collections: enabling Research and education in the 21st Century.  P.9. Available at:http://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf

• Research data, unlike other types of information, is collected, observed, or createdfor purposes of analysis to produce original research results

   - University of Edinburgh. How to manage research data: Defining research data.

• In the context of these Principles and Guidelines [Principles and Guidelines for Access to Research Data from Public Funding], “research data” are defined as factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings.   

    - Organisation for Economic Co-operation and Development (OECD, 2007). OECD Principles and Guidelines for Access to Research Data from Public Funding. P.13. Available at: http://www.oecd.org/dataoecd/9/61/38500813.pdf

Dataset

 Data set: A logically meaningful collection or grouping of similar or related data, usually assembled as a matter of record or for research, for example, the American FactFinder Data Sets provided online by the U.S. Census Bureau or the National Elevation Dataset available from the U.S. Geological Survey. Also spelled dataset.

   - Online dictionary for library and information science (ODLIS). Available at: http://www.abc-clio.com/ODLIS/odlis_A.aspx.

• A research data set constitutes a systematic, partial representation of the subject being investigated.

   - Organisation for Economic Co-operation and Development (OECD, 2007). Available at: http://www.oecd.org/dataoecd/9/61/38500813.pdf.

• DOE generates scientific research data in many forms, both text and non-text. Much of the Department's text-based R&D results are readily available via OSTI databases. OSTI has broadened efforts to make non-text scientific and technical information (STI) available as well, providing access to underlying non-text data such as numeric files, computer simulations and interactive maps, as well as multimedia and scientific images.         

   - Department of Energy (DOE). Available at: http://www.osti.gov/data/index.shtml

• Over the life course of a survey that results in a data set – from initial conceptualization to data publication and beyond -- a huge amount of metadata is typically produced. These metadata can be recorded in DDI format and re-used as the data collection, processing, tabulation, and reporting/dissemination take place

   - Arofan Gregory, Open Data Foundation (2011). The Data Documentation Initiative (DDI): An Introduction for National Statistical Institutes. Available at:http://odaf.org/papers/DDI_Intro_forNSIs.pdf

Research Data Types

Research data can be generated for different purposes and through different processes. Based on Research Information Network, it can include the following types of data:

  • Observational: data captured in real-time, usually irreplaceable. For example, sensor data, survey data, sample data, neuroimages.
  • Experimental: ldata from lab equipment, often reproducible, but can be expensive. For example, gene sequences, chromatograms, toroid magnetic field data.
  • Simulation: data generated from test models where model and metadata are more important than output data. For example, climate models, economic models.
  • Derived or compiled: data is reproducible but expensive. For example, text and data mining, compiled database, 3D models.
  • Reference or canonical: a (static or organic) conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated. For example, gene sequence databanks, chemical structures, or spatial data portals.

Dataset Metadata Checklist

Metadata and documentation are different things: Documentation is meant to be read by humans; some metadata is designed more for machine processing than human readability. However metadata can be taken as a type of documentation. Create and generate metadata for your research data and datasets in your research lifecycle to preserve the data in the long run.

1. Consider what information is needed for the data to be read and interpreted in the future.

2. Understand your funder requirements for data documentation and metadata. Funder requirements for NSF, GBMF, IMLS, NEH, NIH and NOAA can be found at https://dmptool.org/guidance.

3. Consult available metadata standards in your field. You may refer to Common Metadata Standards and Domain Specific Metadata Standards for details.

4. Describe data and datasets created in your research lifecycleand use software programs and tools to assist in data documentation. Assign or capture administrative, descriptive, technical, structural and preservation metadata for the data. Some potential information to document:

  • Descriptive metadata
    • Name of creator of data set
    • Name of author of document
    • Title of document
    • File name
    • Location of file
    • Size of file
  • Structural metadata
    • File relationships (e.g. child, parent)
  • Technical metadata
    • Format (e.g. text, SPSS, Stata, Excel, tiff, mpeg, 3D, Java, FITS, CIF)
    • Compression or encoding algorithms
    • Encryption and decryption keys
    • Software (including release number) used to create or update the data
    • Hardware on which the data were created
    • Operating systems in which the data were created
    • Application software in which the data were created
  • Administrative metadata
    • Information about data creation (e.g. date)
    • Information about subsequent updates, transformation, versioning, summarization
    • Descriptions of migration and replication
    • Information about other events that have affected the files
  • Preservation metadata
    • File format (e.g. .txt, .pdf, .doc, .rtf, .xls, .xml, .spv, .jpg, .fits)
    • Significant properties
    • Technical environment
    • Fixity information

5. Adopt a thesauri in your field or compile a data dictionary for your dataset.

6. Obtain persistent identifiers (e.g. doi) for datasets if possible to ensure data can be found in the future.

For your full data management plan, please refer to Digital Curation centre’s Checklist for a Data Management Plan.

(Source: DMPTool: https://dmp.cdlib.org/; Digital Curation: A How-To-Do-It Manual; Digital Curation Centre: http://www.dcc.ac.uk/)