You have data, we have questions!

15 Oct 2018, 14:30
15m
North Quad room 2435 (University of Michigan)

North Quad room 2435

University of Michigan

105 S. State St. Ann Arbor, MI 48109-1285
Lightning Talk Science Use-Cases Lightning Talks

Speaker

Susan Borda (University of Michigan - Library)

Description

Researchers put considerable time and effort into research, and the resulting data is a significant scholarly product. As with papers/articles/presentations, data should be treated like a first-class scholarly/research product that requires additional considerations. Publishers and funding agencies are requiring researchers to share supporting data, for example. Repository managers, data curators, and archivists would like to assist researchers in meeting such data requirements. However, many questions arise in the minds of those who would like to assist researchers, especially when it comes to more complicated data such as variety at the center of this symposium, large, collaborative, model or simulation driven computational data. This lightning talk will address important questions for researchers to consider as they deposit their research data.

Here is a sampling of the issues researchers should think about:

Reproducibility goals:
Is there a further requirement for reproducibility or replicability? If so, is the dataset complete?

Preservation goals:
Is it worth it to keep simulation data long-term, more than ten years? If not, should anything remain, what would be of use to future researchers? Does the evolution and development of better/faster simulation software and computing technology over time make the preservation of older simulation data redundant? As it becomes easier to replicate simulations and rerun them as needed, does the actual data output need to be preserved, or just the inputs/parameters? Is there some simulation output data worth keeping? If so, what are the criteria?

Re-use goals:
Should the raw data be shared or only the “final,” analyzed data that directly support the figures in the paper? Could the raw data be useful to someone else in the same discipline or another one? What role do repositories play in the potential reuse of data? If researchers want to reuse someone else’s data, are they more likely to access it through a repository or contact the researcher directly? Libraries keep talking about data re-use as an important driver for data sharing, but is it?

Answering these questions supports both the immediate goals of the dataset as well as anticipating what will be necessary to preserve and possibly re-use the data. Depending on the real needs of the researcher and dataset, the files and information included with the deposit may need to change. Data retention timelines may differ as well, depending on the method used to generate it.

Authors

Susan Borda (University of Michigan - Library) Scott Witmer (University of Michigan - Library)

Presentation materials