Workshop on the future of Big Data management

Lecture Theatre 3 (LT3) in the Blackett Laboratory (Imperial College London)

Lecture Theatre 3 (LT3) in the Blackett Laboratory

Imperial College London

David Colling (Imperial College Sci., Tech. & Med. (GB)), Jens Jensen (CLRC-RAL), Wahid Bhimji (University of Edinburgh (GB))
"Big Data" is now being managed by various academic and industry groups. This workshop is organised by the LHC community and will bring together a range of different participants from different disciplines working with Big Data. The aim is to explore the current and future challenges in data processing, storage, transfer and preservation. The workshop focuses on the infrastructure, technologies, and tools, with a view of bringing together communities. This meeting will seek to achieve the following outputs: - To build a cross-disciplinary community in Big Data who can exchange knowledge and best practice and work together as this field evolves in the future. - The meeting discussion will be processed afterwards to form a working document that represent the current state of knowledge and future plans for big data communities. The video of the event is now available: Thu: Fri:
  • Adam Huffman
  • Adrian Jenkins
  • Andrew Hanushevsky
  • Andrew Lyall
  • Andrew Prescott
  • Andrew Richards
  • Andrew Washbrook
  • Arnaud Wolfer
  • Baudouin Raoult
  • Ben Still
  • Brian Davies
  • Chris Kenny
  • Christopher Walker
  • Daniel Hanlon
  • Daniela Bauer
  • Dave Coughlin
  • David Colling
  • David Michel
  • Dirk Duellmann
  • Dugan Witherick
  • Duncan Rand
  • Ed Browne
  • Emyr James
  • Ewan Mac Mahon
  • fabien richard
  • Fiona Armstrong
  • Florian Geier
  • Fons Rademakers
  • Giulio Fella
  • Goncalo Correia
  • Guy Coates
  • Jake Pearce
  • James Abbott
  • James Coomer
  • Jamie Shiers
  • Jens Jensen
  • Jeremy Maris
  • John Gordon
  • John Swinburne
  • Jon Lockley
  • Jonathan Tilbury
  • Juan Bicarregui
  • Kati Lassila-Perini
  • Kenji Takeda
  • Klimentov Alexei
  • Lev Shamardin
  • Mahesh Pancholi
  • Marc O'Brien
  • Marcel van Drunen
  • Mark Rothwell
  • mark van de Sanden
  • Matt Johnson
  • Matthew Viljoen
  • Michael Mueller
  • Michail Salichos
  • Nick Trigg
  • Oliver Duke-Williams
  • Oliver Keeble
  • Owen Embury
  • Patrick McGarry
  • Paul Lewis
  • Peter Clapham
  • Peter Clarke
  • Peter Gronbech
  • Philip Kershaw
  • Raymond Beuselinck
  • Rich Bemrose
  • Richard Bantges
  • Richard Mount
  • Robert Lowe
  • Roger Jones
  • Sarah Butcher
  • Shaun De Witt
  • Shun Liang
  • Simon Fayer
  • Simon Metson
  • Sonia Sousa
  • Stephen Pascoe
  • Steve Lloyd
  • Steve Loughran
  • Tristan Clark
  • Victor Cornell
  • Wahid Bhimji
  • Yotsawat Pomyen
    • 10:00 AM 10:30 AM
      Coffee 30m
    • 10:30 AM 10:40 AM
      Welcome and Scene Setting 10m
      Speaker: Dr David Colling (Imperial College Sci., Tech. & Med. (GB))
    • 10:40 AM 1:00 PM
      Big data needs of different communities

      To establish the requirements driving the later discussion.

      Convener: Dr David Colling (Imperial College Sci., Tech. & Med. (GB))
      Questionnaire Responses
      • 10:40 AM
        High Energy Physics inc LHC 20m
        Speaker: Dr Richard Philip Mount (SLAC National Accelerator Laboratory (US))
      • 11:00 AM
        Astronomy inc SKA 20m
        Speaker: Dr Paul Calleja (University of Cambridge)
      • 11:20 AM
        Cloud computing and data intensive research 20m
        Speaker: Dr Kenji Takeda (Microsoft Research)
      • 11:40 AM
        Earth Observation and Climate Modelling 20m
        Speaker: Phil Kershaw (CEMS)
      • 12:00 PM
        Weather forcasting 20m
        Speaker: Baudouin Raoult (European Centre for Medium-Range Weather Forecasts)
      • 12:20 PM
        PanData and the Research Data Alliance 20m
        Speaker: Juan Bicarregui (STFC)
      • 12:40 PM
        Economic and Social Science 20m
        Speaker: Fiona Armstrong (ESRC)
    • 1:00 PM 2:00 PM
      Lunch 1h
    • 2:00 PM 3:00 PM
      Big data needs of different communities

      To establish the requirements driving the later discussion.

      Convener: Jens Jensen (CLRC-RAL)
      Questionnaire Responses
      • 2:00 PM
        Bioinformatics 20m
        Speaker: Dr Guy Coates (Wellcome Trust Sanger Institute)
      • 2:20 PM
        ELIXIR: An infrastructure for biological information in Europe 20m
        Speaker: Andrew Lyall (EMBL-EBI)
      • 2:40 PM
        Arts and Humanities 20m
        Speaker: Prof. Andrew Prescott (King's College London/AHRC)
    • 3:00 PM 3:20 PM
      Tea 20m
    • 3:20 PM 5:20 PM
      Data Storage: Advanced filesystems and interfaces
      • Advances in cluster Filesystems : Lustre; Ceph; HDFS ; GPFS
      • Data access interfaces and protocols.
      • Storage management interfaces
      • Advances in storage hardware.
      • High-throughput storage strategies, caching,
      Convener: Shaun De Witt (Unknown)
      • 3:20 PM
        GPFS 20m
        Speaker: Vic Cornell (DataDirect Networks)
      • 3:40 PM
        HDFS 20m
        Speaker: Steve Loughran (Hortonworks)
      • 4:00 PM
        Large scale solutions with Lustre 20m
        Speaker: John Swinburne (Intel)
      • 4:20 PM
        An Intro to Ceph and Big Data 20m
        Speaker: Patrick McGarry (inktank)
      • 4:40 PM
        CERN experiences with EOS, S3 and Ceph 20m
        Speaker: Dirk Duellmann (CERN)
      • 5:00 PM
        Discussion: Filesystem needs for different communities 20m
    • 7:30 PM 9:05 PM
      Dinner 1h 35m
      The Workshop Dinner will be held at Med Kitchen on Gloucester Road and is kindly sponsored by DDN ( A map to Med Kitchen (with walking directions) can be found in the material attached to this event.
    • 9:00 AM 11:00 AM
      Data Processing: Toolkits,Data structures, I/O optimisation
      • analysis packages and tools for data processing.
      • data visualisation
      • Serialisation formats
      • Layout and access optimisations
      • Benchmarking
      Convener: Wahid Bhimji (University of Edinburgh (GB))
      • 9:00 AM
        ROOT current architecture and plans 20m
        Speaker: Fons Rademakers (CERN)
      • 9:20 AM
        Hadoop data processing 20m
        Speaker: Steve Loughran (Hortonworks)
      • 9:40 AM
        Contrast between big data processing in academia and industry 15m
        Speaker: Simon Metson (Cloudant)
      • 9:55 AM
        Optimising bioinformatics pipelines for clinical genomics 20m
        Speaker: Dr Michael Mueller (Imperial College)
      • 10:15 AM
        Marmal-aid: a tool for genomics processing 10m
        Speaker: Dr Rob Lowe (QMUL)
      • 10:25 AM
        Astronomy toolkits and data structures 20m
        Speaker: Dr Adrian Jenkins (Durham University)
      • 10:45 AM
        Discussion: Building on strengths of tools for all communitiies 15m
    • 11:00 AM 11:30 AM
      Coffee 30m
    • 11:30 AM 12:10 PM
      Data Storage: Hardware
      • Advances in cluster Filesystems : Lustre; Ceph; HDFS ; GPFS
      • Data access interfaces and protocols.
      • Storage management interfaces
      • Advances in storage hardware.
      • High-throughput storage strategies, caching,
      Convener: Wahid Bhimji (University of Edinburgh (GB))
      • 11:30 AM
        Hardware for big data: lessons learned 20m
        Speaker: Marcel van Drunen (Dell)
      • 11:50 AM
        High performance storage solutions 20m
        Speaker: James Coomer
    • 12:10 PM 1:10 PM
      Lunch 1h
    • 1:10 PM 2:55 PM
      Data Transfer: Protocols and tools
      • File transfer services: (FTS iRods...)
      • Remote access (Federated data stores...)
      Convener: Roger Jones (Lancaster University (GB))
      • 1:15 PM
        Network developments 20m
        Speaker: Paul Lewis (JANET)
      • 1:35 PM
        The Evolution of the FTS File Transfer Service Tool 20m
        Speaker: Michail Salichos (CERN)
      • 1:55 PM
        Federated Data Stores - Volume, Velocity & Variety 20m
        Speaker: Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER)
      • 2:15 PM
        EUDAT technology choices 20m
        Speaker: Mark van de Sanden (SURFsara)
      • 2:35 PM
        Discussion: Future of data transfer 15m
    • 2:55 PM 3:35 PM
      Data Management: meta-data, data discovery and preservation
      Convener: Richard Bantges
      • 2:55 PM
        Environmental Data Archival 20m
        Speaker: Stephen Pascoe (CEDA)
      • 3:15 PM
        The Application of Raimes' Rules to Long-Term Data Preservation 20m
        Speaker: Dr Jamie Shiers (CERN)
        DPHEP Indico
        G8 Statement
    • 3:35 PM 3:45 PM
      Tea 10m
    • 3:45 PM 4:45 PM
      Data management 2: Open access and preservation
      Convener: Jens Jensen (CLRC-RAL)
      • 3:55 PM
        DOIs for tracking data 20m
        Speaker: Mr Matthew James Viljoen (STFC - Science & Technology Facilities Council (GB))
      • 4:15 PM
        Digital preservation 20m
        Speaker: Jonathan Tilbury (Tessela)
      • 4:35 PM
        Discussion 10m
    • 4:45 PM 5:00 PM
      Roundup and ways forward
      Convener: Dr David Colling (Imperial College Sci., Tech. & Med. (GB))