Symposium of the Center for Network and Storage Enabled Collaborative Computational Science

America/Detroit
North Quad room 2435 (University of Michigan)

North Quad room 2435

University of Michigan

School of Information 105 S. State St. Ann Arbor, MI 48109-1285
Dan Meisler (University of Michigan), Shawn Mc Kee (University of Michigan (US))
Description

The Center for Network and Storage Enabled Collaborative Computational Science is hosting a symposium at the University of Michigan on May 18 and 19, exploring the themes the Center was founded on. The Center seeks to address the challenges of extracting scientific results collaboratively from large, distributed or diverse data.

Venue: School of Information, North Quad room 2435 105 S. State St. Ann Arbor, MI 48109-1285

The Challenge: Many scientific disciplines are rapidly increasing the size, variety and complexity of data they must work with. As the data grows, scientists are challenged to manage, share and analyze that data and become diverted from a focus on their scientific research to data-access and data-management concerns. Even more problematic is determining how to support many scientists sharing and accessing this ever increasing amount of data.

The Center is working to respond to those challenges broadly. Included in the Center is the NSF-funded OSiRIS project, a collaborative, multi-university venture led by MICDE faculty, and hosted by ARC-TS.

The following questions illustrate some of the focus areas the Center is seeking to address:

  • What are the best practices for collaboratively working on large, potentially diverse or distributed, datasets?
  • What tools, technologies and techniques are most effective at addressing the challenges faced by such researchers?
  • How should data best be stored, organized, indexed and made accessible to improve the ability of scientists to jointly work with one another, especially across the dimensions of time and space?

This symposium is intended to bring together those interested in these questions to share experiences and best practices, and to discuss both challenges and possible solutions that enable scientists to work together on “big, distributed or diverse data”.

Speakers will come from a wide range of research domains, as well as federal funding agencies:

  • Amy Friedlander, Deputy Division Director, Division of Advanced Cyberinfrastructure (CISE/ACI), National Science Foundation.
  • Richard Carlson, Program Officer, Advanced Scientific Computing Research (ASCR), Department of Energy
  • Nina Silverberg, Program Director, Alzheimer's Disease Centers program, Division of Neuroscience, National Institutes of Health 
  • Chris Hill, Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology
  • Franco Pestilli, Department of Psychological and Brain Sciences, Indiana University
  • Sara Aton, Molecular, Cellular and Developmental Biology, University of Michigan
  • Brian Arbic, Earth and Environmental Sciences, University of Michigan
  • Karthik Duraisamy, Aerospace Engineering, University of Michigan
  • Cindy Chestek, Biomedical Engineering, University of Michigan

We are looking forward to a rich program of presentations and discussions and hope you are able to participate. 

Participants
  • Alan Rask
  • Alauddin Ahmed
  • Ali Mazeh
  • Amanda Deacon
  • Amy Friedlander
  • Annette Ostling
  • Anthony Kremin
  • Avisek Das
  • Benjeman Meekhof
  • Bikash Kanungo
  • Bob Sabourin
  • Bobbie Wu
  • Brian Arbic
  • Brian Demczyk
  • Brock Palen
  • Carl Simon Adorf
  • Cathy Curley
  • Charlene Tao
  • Chris Hill
  • Chrono Nu
  • Colleen McCormick
  • Cynthia Chestek
  • Dan Meisler
  • Daniel Kessler
  • Daoheng Niu
  • Dave Daniszewski
  • DePriest Dockins
  • Efren Cruz
  • Elizabeth Wagner
  • Emily Hector
  • Eric Lakin
  • Eric Michielssen
  • Eunshin Byon
  • Evangeline Spindler
  • Faye Ogasawara
  • Gabriele Carcassi
  • galina grom
  • Haiyin Liu
  • Hiroko Dodge
  • Ian Stakenvicius
  • Jake LaRosa
  • James Koopman
  • Jamie Estill
  • Jay Lang
  • Jeffrey Sica
  • Jichao Li
  • Jill Davidson
  • Jim Bujaki
  • Jim Kenyon
  • John Simpkins
  • Jon Ravelo
  • JP Eldous
  • Karthik Duraisamy
  • Katarina Thomas
  • Keila Walton
  • Kelli Trosvig
  • Krishna Garikipati
  • Kristin Kovarik
  • Kui-Bin Im
  • Manish Verma
  • Mariana Carrasco-Teja
  • Matt McLean
  • Matt Wojick
  • Michael Ausilio
  • Mingyang Hei
  • Nina Silverberg
  • Pengyuan Xiu
  • Ping Hou
  • Rafael Meza
  • rex lau
  • Richard Carlson
  • Richard Gonzalez
  • Robert Krasny
  • Roman Gayduk
  • Roy Chartier
  • Sam Savant
  • Sara Aton
  • Saul Youssef
  • Sharon Broude Geva
  • Shawn McKee
  • Sherry Sparks
  • Soichi Hayashi
  • T. Charles Yun
  • Thomas Murphy
  • Todd Raeker
  • Vineet Raichur
    • 13:00 13:45
      Welcome and Overview North Quad room 2435

      North Quad room 2435

      University of Michigan

      School of Information 105 S. State St. Ann Arbor, MI 48109-1285

      Session with logistics, welcome and overview of the Symposium

      Convener: Shawn Mc Kee (University of Michigan (US))
    • 15:15 16:00
      Coffee, Snacks and Discussions 45m North Quad room 2435

      North Quad room 2435

      University of Michigan

      School of Information 105 S. State St. Ann Arbor, MI 48109-1285

      Coffee, snacks and informal discussions

    • 16:00 17:00
      Science Use Cases: Session 1 North Quad room 2435

      North Quad room 2435

      University of Michigan

      School of Information 105 S. State St. Ann Arbor, MI 48109-1285
      Convener: Prof. Richard Gonzalez (University of Michigan)
      • 16:00
        Neural Interfaces for Controlling Finger Movements 30m

        Brain machine interfaces or neural prosthetics have the potential to restore movement to people with paralysis or amputation, bridging gaps in the nervous system with an artificial device. Microelectrode arrays can record from hundreds of individual neurons in motor cortex, and machine learning signals can be used to generate useful control signals from this neural activity. Performance can already surpass the current state of the art in assistive technology in terms of controlling the endpoint of computer cursors or prosthetic hands. The natural next step in this progression is to control more complex movements at the level of individual fingers. Our lab has approached this problem in three different ways. For people with upper limb amputation, we acquire signals from individual peripheral nerve branches using small muscle grafts to amplify the signal. After a successful study in animals, human study participants have recently been able to control individual fingers online using acute electrodes within these grafts. For spinal cord injury, where no peripheral signals are available, we implant Utah arrays into finger areas of motor cortex, and have successfully decoded finger flexion and extension with correlations above 0.8. Decoding “spiking band” activity at much lower sampling rates, we recently showed that power consumption of an implantable device could be reduced by 89% compared to existing broadband approaches, and fit within the specification of existing systems for upper limb functional electrical stimulation. Finally, finger control is ultimately limited by the number of independent electrodes that can be placed within cortex or the nerves, and this is in turn limited by the extent of glial scarring surrounding an electrode. Therefore, we developed an electrode array based on 8 um carbon fibers, no bigger than the neurons themselves. We were able to insert arrays with 3x the density of the Utah array by temporarily shortening the fibers for penetration of the top cortical layers. This enabled chronic recording of single units with no apparent contiguous scarring over time. The long-term goal of this work is to make neural interfaces for the restoration of hand movement a clinical reality for everyone who has lost the use of their hands.

        Speaker: Prof. Cynthia Chestek (University of Michigan)
      • 16:30
        Brain-Life: Engaging neuroscience workforce in big data and reproducible research. 30m

        Neuroscience is engaging at the forefront of science by dissolving disciplinary boundaries and promoting transdisciplinary research. This is a process that, in principle, can facilitate discovery by convergent efforts from theoretical, experimental and cognitive neuroscience, as well as computer science and engineering. To assure the success of this process the current lack of established mechanisms to guarantee reproducibility of scientific results must be overcome. Promoting open software and data sharing has become paramount to address reproducibility. This project addresses challenges to neuroscience reproducibility by providing integrative mechanisms for publishing data, and algorithms while embedding them with compute resources to impact multiple scientific communities.

        Speaker: Prof. Franco Pestili (Indiana University)
    • 17:00 19:00
      Social Dinner 2h

      Dinner at Grizzly Peak http://grizzlypeak.net/

      Doodle poll http://doodle.com/poll/bdzdhpypnd3r8ygq

    • 08:15 09:00
      Light breakfast, coffee 45m North Quad room 2435

      North Quad room 2435

      University of Michigan

      School of Information 105 S. State St. Ann Arbor, MI 48109-1285

      Light breakfast snacks and coffee

    • 09:00 09:30
      Science Use Cases: Session 2 North Quad room 2435

      North Quad room 2435

      University of Michigan

      School of Information 105 S. State St. Ann Arbor, MI 48109-1285
      Convener: Dr Mariana Carrasco-Teja (University of Michigan)
      • 09:00
        Storage challenges in ocean modeling 30m

        In this talk I will discuss the storage challenges of ocean modeling, using Navy and NASA ocean models as examples.

        Speaker: Prof. Brian Arbic (University of Michigan)
    • 09:30 10:00
      Complementary Technology Solutions: Session 1 North Quad room 2435

      North Quad room 2435

      University of Michigan

      School of Information 105 S. State St. Ann Arbor, MI 48109-1285

      User experiences, challenges and requests

      Convener: Dr Sharon Broude Geva (University of Michigan)
      • 09:30
        Data Ingestion at Scale 30m

        HPC traditionally handles data at rest. The acquisition of streaming data presents a different set of challenges that, at scale, can be difficult to tackle. The approach to building data ingestion infrastructure at ARC-TS involves treating every service as a swappable building block. With this pluggable design using Docker containers you are free to choose which component is best. We will use an example use case to show how data is being generated, ingested, and how each component in the stack can be replaced.

        Speaker: Jeffrey Sica (University of Michigan)
    • 10:00 10:30
      Coffee Break 30m North Quad room 2435

      North Quad room 2435

      University of Michigan

      School of Information 105 S. State St. Ann Arbor, MI 48109-1285

      Coffee, snacks and informal discussions

    • 10:30 11:30
      Complementary Technology Solutions: Session 2 North Quad room 2435

      North Quad room 2435

      University of Michigan

      School of Information 105 S. State St. Ann Arbor, MI 48109-1285

      User experiences, challenges and requests

      Convener: Dr Sharon Broude Geva (University of Michigan)
      • 10:30
        Bootstrapping Big Data with Spark SQL and Data Frames 30m

        Apache Spark, a popular open source big data tool form the Hadoop ecosystem is seeing rapid adoption across industry and academia, yet it is still generally not well known. For this talk we will demonstrate some large scale samples of how easy it is to benefit form spark SQL and Data Frames for Python and R programmers.

        Speaker: Brock Palen
      • 11:00
        Highly scalable metadata management with signac 30m

        Continually increasing computational resources and improved efficiency of parallelized software for data generation and manipulation in the field of scientific computation have led to the requirement of more systematic approaches for data management. We present a data management framework designed to work on both desktop computers and in high-performance computing environments with special emphasis on low entry barriers for both new and experienced users. The signac framework assists in the decentralized storage of data and metadata on the file system by providing all basic components needed for building simple to complex data pipelines largely agnostic of data source and format. These managed data spaces are immediately searchable through a homogeneous interface and in this way more accessible to data owners, but also collaborators. Sharing of data across different endpoints is simplified through the generation of metadata indices that contain information about data provenance and current location. The framework's data model is designed not to require absolute commitment to the presented implementation. This reduces barriers for the integration into existing workflows and increases the accessibility to archived data sets. The presented approach simplifies the production of scientific results and collaboration on shared data sets.

        Speaker: Carl S. Adorf (University of Michigan)
    • 11:30 13:00
      Lunch 1h 30m

      Lunch on your own. See http://restaurantsinannarbor.com/state-street/

    • 13:00 14:30
      Funding Agency Perspectives North Quad room 2435

      North Quad room 2435

      University of Michigan

      School of Information 105 S. State St. Ann Arbor, MI 48109-1285

      Presentations by federal funding agencies providing their perspective on the Symposium Topics

      Convener: Shawn Mc Kee (University of Michigan (US))
      • 13:00
        Workflow Science: Moving from tool generation to Discovery 30m

        Workflow systems have emerged as the coordination engine behind large distributed data intensive science experiments. They manage the movement of data, the allocation of resources, and display of results for a growing number of science communities. However, existing workflow systems are typically simple, purpose built tools that automate some of the routine tasks a scientist performs. Future workflow systems will need to do more autonomous work, deal with more heterogeneous resources, and provide SMARTer interfaces to both scientists and facilities staff. To achieve this objective Workflow Science needs to move from a tool generation activity to a research and discovery process in its own right. Workflow scientists need to develop the methods, experiments, models, and simulations that can describe and validate the behavior of any workflow system ensuring that it is operating correctly and efficiently.

        Speaker: Mr Richard Carlson (DOE Office of Science)
      • 13:30
        Big Data – view from National Institute on Aging at the National Institutes of Health 30m

        Dr. Silverberg will describe some large, NIH and NIA initiatives, such as BD2K, that begin to address big data challenges of current health related research. Additionally, multiple examples of NIH and NIA research projects will illustrate the ever increasing needs for big data solutions. Finally, a few relevant funding opportunity announcements will be shared.

        Speaker: Dr Nina Silverberg (National Institutes of Health)
      • 14:00
        Data in the Research Cyberinfrastructure Ecosystem 30m

        This presentation discusses some of NSF’s data programs in the context of advancing the research cyberinfrastructure. This includes interfaces with the physical layer (high performance computing, networking), software, and the research disciplines as well as the potential roles and responsibilities of different stakeholders.

        Speaker: Amy Friedlander (National Science Foundation)
    • 14:30 15:00
      Coffee Break 30m North Quad room 2435

      North Quad room 2435

      University of Michigan

      School of Information 105 S. State St. Ann Arbor, MI 48109-1285
    • 15:00 16:30
      Science Use Cases: Session 3 North Quad room 2435

      North Quad room 2435

      University of Michigan

      School of Information 105 S. State St. Ann Arbor, MI 48109-1285
      Convener: Prof. Brian Arbic (University of Michigan)
      • 15:00
        Revealing and examining the tempestuous Global Ocean through a multi-petabyte virtual ocean archive. 30m

        This talk will explore how computational science and evolving network
        and storage capabilities, together with ongoing improvements in remote
        and in-situ sensing, may be poised, possibly like never before, to
        have significant impacts on global ocean research. Simultaneous improvements
        across network, storage, computation and sensing technologies are beginning
        to create a new lens through which to view, explore and understand some
        of the key mathematics and observations used to describe and reason about
        physical, chemical and biological aspects of the Earth's oceans.

        Specifically this presentation examines a global one-kilometer horizontal
        resolution numerical ocean computation that embraces network and storage
        enabled computational science based approaches. The computation and some
        of its applications will be described. Some of the key network, storage
        and computational science technology ingredients that enable the work
        will be outlined.

        The computation examined is work that was recently undertaken using the
        NASA Pleadies computer. It is one of a new generation of ocean computations
        that include representations of tidal forcings and realistic synoptic
        meterology. Including these aspects, at kilometer scale resolution, captures
        more of the rich dynamics present and observed in the real ocean. This
        qualitatively increases fidelity of the spatial and temporal variability
        represented numerically.

        Our calculation is initialized from a data constrained estimate of the
        real-world, large-scale global ocean state. It is driven with boundary
        conditions taken from high-resolution, data assimilating weather models. The
        domain is fully global. Interestingly, from a network and storage enabled
        computational science perspective, we chose to take a uniquely ambitious
        approach to storing and distributing the simulation solution. We sampled
        and archived computation state to a storage subsystem at hourly frequency
        and at full global resolution for a full year. This created a new and
        novel resource for ocean research. It is multi-petabyte in size and has
        global coverage.

        The resulting set of more than 10^15 spatially and temporally varying
        numerical values is supporting a variety of interesting and insightful
        studies. Many of these would not be easily possible without the underlying
        network and storage cyberinfrastructure. Advanced cyberinfrastructure
        underlies archive creation, enables distribution of sizable sub-samples from
        the archive, and provides tools used in multiple subsequent research studies.

        High spatial and temporal storing of the computation more readily
        reveals an ocean that is teeming with turbulent vorticies and wave
        motions globally. A series of eye catching visualizations illustrate
        this. They show what the ocean would look like to eyes that could discriminate
        components vorticity and density surfaces, instead of visible light!

        Examining local regions in frequency wave number space, the stored solution
        provides notably more complete comparison with theoretical predictions and
        historical observations than previous generation ocean models. This increased
        fidelity, combined with the rich sampling archive, is allowing the effort to
        help guide and support focussed observational field campaigns both at specific
        locations and globally.

        High spatial and frequency capture also allows us to explore new directions in
        developing statistical relations between readily observable ocean fields and
        features of interest that are not as directly observable. One example of
        this, is trying to reduce the stochastic uncertainty due to the ocean internal
        wave field that impacts acoustic travel time estimates. Underwater acoustics
        is a potentially powerful tool for measuring the ocean and for creating fully
        mobile sub-surface networks. It is notoriusly challenging in part because of
        inherent low bandwidth, but also in part because of the complicated time
        dependent nature of the ocean as a transmission media. We will illustrate how
        network and storage enabled approaches can be leveraged in this context.
        Leveraging these approaches allows us to develop new ways to determine aspects
        of the internal wave field statistics in a more complete manner. This work
        draws on the application of statistical methods prevalent in machine
        learning/big-data communities. Using those methods we can develop
        various semi-empirical regressions between observable fields and
        internal wave statistics. Application of these sorts of methods is
        fundamentally enabled by increasingly robust storage and network
        cyberinfrastructure technologies.

        Another example application looks at the role of high spatio-temporal frequency
        processes in shaping marine microbial patterns in the ocean. Microbial
        communities in the ocean form the base of the food chain and play a major, but
        uncertain, role in Earths carbon, oxygen and nitrogen balance. Marine microbial
        community structure and ecosystem dynamics remain an area of active research. A
        highly sampled global fluid solution with spatial and temporal resolution down
        to scales of kilometers and hours support new ways to explore possible ideas on
        governing mechanisms for these communties. Recent work in this context will be
        illustrated.

        Finally, we will also sketch briefly the network and storage technologies
        employed. We will describe approaches for storing data at adequate rates and
        for disseminating the solution across national networks. The approaches are
        allowing us to begin to share solutions widely, to local/regional facilities and
        to cloud services including Dropbox, AWS and Azure. The technical lessons from
        this exercise show great promise. They provide an illustration of the potential
        that future ongoing hyperconnected cyberinfrastructure investments could
        unleash - especially if key technologies are made more routine and
        implemented generally in a sufficiently interoperable, capable and
        cost-effective manner.

        Speaker: Chris Hill (MIT)
      • 15:30
        Experiences Sharing Data and Models from a Multi-Institutional Cancer Modeling Consortium 30m

        This abstract to be updated shortly

        Speaker: Dr Rafael Meza (University of Michigan)
      • 16:00
        Slow wave sleep oscillations coordinate neural ensembles during memory consolidation. 30m

        The brain routinely integrates polymodal sensory inputs into a coherent representation of events, which is subsequently stored in memory. A long-standing question in neuroscience is how fleeting experiences can modify neural networks to produce memories that are long-lasting, stable, and robust to interference. The advent of new recording technologies allows investigators to monitor and manipulate activity in hundreds or thousands of neurons simultaneously in vivo, during behavior. While this can be used to establish links between neural network activity and brain functions, two issues complicate this endeavor. First, neural activity patterns typically are quantified over milliseconds-to-minutes timescales, while behaviors evolve over longer timescales (seconds, days, or even years). Second, it is not obvious what features of network dynamics constitute a “signal” associated with a specific brain function, vs. “noise” which is irrelevant to that function.The Aton and Zochowski labs have recently developed metrics to characterize how hippocampal network dynamics change as a function of new learning in mice, during active long-term contextual fear memory formation. We have found that after new information is encoded in hippocampal area CA1 (i.e., following one-trial contextual fear conditioning [CFC]), network dynamics in this area become increasingly stable. This can be demonstrated statistically by calculating changes in mean CA1 network stability (from baseline) across a 24-h period following CFC, and compared this with stability changes in animals which either 1) have undergone CFC, but had subsequent memory formation disrupted through brief post-CFC sleep deprivation (SD), or 2) have undergone a sham behavioral procedure instead of CFC (i.e., where no contextual fear memory is expected). We find that mean stability increases with memory formation, that this increase is disrupted (particularly during slow wave sleep [SWS]) following SD, that these changes are sustained for at least 24 h after learning, and that changes to stability during SWS can predict an individual animal’s behavioral contextual fear memory recall 24 h after learning. We also find that the same pattern of network connectivity is consistently repeated for several hours during post-CFC SWS. One feature of SWS which makes it unique from other states is the presence of high-amplitude, low frequency network oscillations in various brain regions. Based on experimental data in which SWS network oscillations in either the hippocampus are transiently disrupted (or mimicked in awake animals), we find that network stability is strongly linked to presence of network oscillations. We propose that stabilization of network dynamics by SWS oscillations could serve as a mechanism to promote long-term memory storage.

        Speaker: Dr Sara Aton
    • 16:30 17:00
      Discussion and Wrap-up: Discussions and Wrap-up North Quad room 2435

      North Quad room 2435

      University of Michigan

      School of Information 105 S. State St. Ann Arbor, MI 48109-1285