Workshop on the future of Big Data management

Europe/London
Lecture Theatre 3 (LT3) in the Blackett Laboratory (Imperial College London)

Lecture Theatre 3 (LT3) in the Blackett Laboratory

Imperial College London

David Colling (Imperial College Sci., Tech. & Med. (GB)), Jens Jensen (CLRC-RAL), Wahid Bhimji (University of Edinburgh (GB))
Description
"Big Data" is now being managed by various academic and industry groups. This workshop is organised by the LHC community and will bring together a range of different participants from different disciplines working with Big Data. The aim is to explore the current and future challenges in data processing, storage, transfer and preservation. The workshop focuses on the infrastructure, technologies, and tools, with a view of bringing together communities. This meeting will seek to achieve the following outputs: - To build a cross-disciplinary community in Big Data who can exchange knowledge and best practice and work together as this field evolves in the future. - The meeting discussion will be processed afterwards to form a working document that represent the current state of knowledge and future plans for big data communities. The video of the event is now available: Thu: http://tinyurl.com/BigDataImpThu Fri: http://tinyurl.com/BigDataImpFri
Minutes
Participants
  • Adam Huffman
  • Adrian Jenkins
  • Andrew Hanushevsky
  • Andrew Lyall
  • Andrew Prescott
  • Andrew Richards
  • Andrew Washbrook
  • Arnaud Wolfer
  • Baudouin Raoult
  • Ben Still
  • Brian Davies
  • Chris Kenny
  • Christopher Walker
  • Daniel Hanlon
  • Daniela Bauer
  • Dave Coughlin
  • David Colling
  • David Michel
  • Dirk Duellmann
  • Dugan Witherick
  • Duncan Rand
  • Ed Browne
  • Emyr James
  • Ewan Mac Mahon
  • fabien richard
  • Fiona Armstrong
  • Florian Geier
  • Fons Rademakers
  • Giulio Fella
  • Goncalo Correia
  • Guy Coates
  • Jake Pearce
  • James Abbott
  • James Coomer
  • Jamie Shiers
  • Jens Jensen
  • Jeremy Maris
  • John Gordon
  • John Swinburne
  • Jon Lockley
  • Jonathan Tilbury
  • Juan Bicarregui
  • Kati Lassila-Perini
  • Kenji Takeda
  • Klimentov Alexei
  • Lev Shamardin
  • Mahesh Pancholi
  • Marc O'Brien
  • Marcel van Drunen
  • Mark Rothwell
  • mark van de Sanden
  • Matt Johnson
  • Matthew Viljoen
  • Michael Mueller
  • Michail Salichos
  • Nick Trigg
  • Oliver Duke-Williams
  • Oliver Keeble
  • Owen Embury
  • Patrick McGarry
  • Paul Lewis
  • Peter Clapham
  • Peter Clarke
  • Peter Gronbech
  • Philip Kershaw
  • Raymond Beuselinck
  • Rich Bemrose
  • Richard Bantges
  • Richard Mount
  • Robert Lowe
  • Roger Jones
  • Sarah Butcher
  • Shaun De Witt
  • Shun Liang
  • Simon Fayer
  • Simon Metson
  • Sonia Sousa
  • Stephen Pascoe
  • Steve Lloyd
  • Steve Loughran
  • Tristan Clark
  • Victor Cornell
  • Wahid Bhimji
  • Yotsawat Pomyen
    • 10:00
      Coffee
    • 1
      Welcome and Scene Setting
      Speaker: Dr David Colling (Imperial College Sci., Tech. & Med. (GB))
      Slides
    • Big data needs of different communities

      To establish the requirements driving the later discussion.

      Convener: Dr David Colling (Imperial College Sci., Tech. & Med. (GB))
      Questionnaire Responses
      • 2
        High Energy Physics inc LHC
        Speaker: Dr Richard Philip Mount (SLAC National Accelerator Laboratory (US))
        Slides
      • 3
        Astronomy inc SKA
        Speaker: Dr Paul Calleja (University of Cambridge)
        Slides
      • 4
        Cloud computing and data intensive research
        Speaker: Dr Kenji Takeda (Microsoft Research)
      • 5
        Earth Observation and Climate Modelling
        Speaker: Phil Kershaw (CEMS)
        Slides
      • 6
        Weather forcasting
        Speaker: Baudouin Raoult (European Centre for Medium-Range Weather Forecasts)
      • 7
        PanData and the Research Data Alliance
        Speaker: Juan Bicarregui (STFC)
      • 8
        Economic and Social Science
        Speaker: Fiona Armstrong (ESRC)
        Slides
    • 13:00
      Lunch
    • Big data needs of different communities

      To establish the requirements driving the later discussion.

      Convener: Jens Jensen (CLRC-RAL)
      Questionnaire Responses
      • 9
        Bioinformatics
        Speaker: Dr Guy Coates (Wellcome Trust Sanger Institute)
        Slides
      • 10
        ELIXIR: An infrastructure for biological information in Europe
        Speaker: Andrew Lyall (EMBL-EBI)
        Slides
      • 11
        Arts and Humanities
        Speaker: Prof. Andrew Prescott (King's College London/AHRC)
        Slides
    • 15:00
      Tea
    • Data Storage: Advanced filesystems and interfaces
      • Advances in cluster Filesystems : Lustre; Ceph; HDFS ; GPFS
      • Data access interfaces and protocols.
      • Storage management interfaces
      • Advances in storage hardware.
      • High-throughput storage strategies, caching,
      Convener: Shaun De Witt (Unknown)
      • 12
        GPFS
        Speaker: Vic Cornell (DataDirect Networks)
        Slides
      • 13
        HDFS
        Speaker: Steve Loughran (Hortonworks)
        Slides
      • 14
        Large scale solutions with Lustre
        Speaker: John Swinburne (Intel)
        Slides
      • 15
        An Intro to Ceph and Big Data
        Speaker: Patrick McGarry (inktank)
        Slides
      • 16
        CERN experiences with EOS, S3 and Ceph
        Speaker: Dirk Duellmann (CERN)
        Slides
      • 17
        Discussion: Filesystem needs for different communities
    • 18
      Dinner
      The Workshop Dinner will be held at Med Kitchen on Gloucester Road and is kindly sponsored by DDN (http://www.ddn.com/). A map to Med Kitchen (with walking directions) can be found in the material attached to this event.
      Slides
    • Data Processing: Toolkits,Data structures, I/O optimisation
      • analysis packages and tools for data processing.
      • data visualisation
      • Serialisation formats
      • Layout and access optimisations
      • Benchmarking
      Convener: Wahid Bhimji (University of Edinburgh (GB))
      • 19
        ROOT current architecture and plans
        Speaker: Fons Rademakers (CERN)
        Slides
      • 20
        Hadoop data processing
        Speaker: Steve Loughran (Hortonworks)
        Slides
      • 21
        Contrast between big data processing in academia and industry
        Speaker: Simon Metson (Cloudant)
        Slides
      • 22
        Optimising bioinformatics pipelines for clinical genomics
        Speaker: Dr Michael Mueller (Imperial College)
      • 23
        Marmal-aid: a tool for genomics processing
        Speaker: Dr Rob Lowe (QMUL)
        Slides
      • 24
        Astronomy toolkits and data structures
        Speaker: Dr Adrian Jenkins (Durham University)
        Slides
      • 25
        Discussion: Building on strengths of tools for all communitiies
    • 11:00
      Coffee
    • Data Storage: Hardware
      • Advances in cluster Filesystems : Lustre; Ceph; HDFS ; GPFS
      • Data access interfaces and protocols.
      • Storage management interfaces
      • Advances in storage hardware.
      • High-throughput storage strategies, caching,
      Convener: Wahid Bhimji (University of Edinburgh (GB))
      • 26
        Hardware for big data: lessons learned
        Speaker: Marcel van Drunen (Dell)
        Slides
      • 27
        High performance storage solutions
        Speaker: James Coomer
    • 12:10
      Lunch
    • Data Transfer: Protocols and tools
      • File transfer services: (FTS iRods...)
      • Remote access (Federated data stores...)
      Convener: Roger Jones (Lancaster University (GB))
      • 28
        Network developments
        Speaker: Paul Lewis (JANET)
        Slides
      • 29
        The Evolution of the FTS File Transfer Service Tool
        Speaker: Michail Salichos (CERN)
        Slides
      • 30
        Federated Data Stores - Volume, Velocity & Variety
        Speaker: Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER)
        Slides
      • 31
        EUDAT technology choices
        Speaker: Mark van de Sanden (SURFsara)
        Slides
      • 32
        Discussion: Future of data transfer
    • Data Management: meta-data, data discovery and preservation
      Convener: Richard Bantges
      • 33
        Environmental Data Archival
        Speaker: Stephen Pascoe (CEDA)
        Slides
      • 34
        The Application of Raimes' Rules to Long-Term Data Preservation
        Speaker: Dr Jamie Shiers (CERN)
        DPHEP@CHEP2013
        DPHEP Indico
        DPHEP.org
        G8 Statement
        H2020@CHEP
        Slides
    • 15:35
      Tea
    • Data management 2: Open access and preservation
      Convener: Jens Jensen (CLRC-RAL)
      • 35
        DOIs for tracking data
        Speaker: Mr Matthew James Viljoen (STFC - Science & Technology Facilities Council (GB))
        Slides
      • 36
        Digital preservation
        Speaker: Jonathan Tilbury (Tessela)
        Slides
      • 37
        Discussion
    • Roundup and ways forward
      Convener: Dr David Colling (Imperial College Sci., Tech. & Med. (GB))
      slides