Workshop on the future of Big Data management

chaired by David Colling (Imperial College Sci., Tech. & Med. (GB)), Wahid Bhimji (University of Edinburgh (GB)), Jens Jensen (CLRC-RAL)
from to (Europe/London)
at Imperial College London ( Lecture Theatre 3 (LT3) in the Blackett Laboratory )
Description
"Big Data" is now being managed by various academic and industry groups.
This workshop is organised by the LHC community and will bring
together a range of different participants from different disciplines 
working with Big Data.  The aim is to explore the current and future 
challenges in data processing, storage, transfer and preservation. 
 The workshop focuses on the infrastructure, technologies, and 
tools, with a view of bringing together communities.

This meeting will seek to achieve the following outputs: 
- To build a cross-disciplinary community in Big Data who can exchange knowledge and best practice and work together as this field evolves in the future.

- The meeting discussion will be processed afterwards to form a working document that represent the current state of knowledge and future plans for big data communities. 

The video of the event is now available: 
Thu:
http://tinyurl.com/BigDataImpThu
Fri:
http://tinyurl.com/BigDataImpFri
Material:
Minutes unknown type filedown arrow
Go to day
  • Thursday, 27 June 2013
    • 10:00 - 10:30 Coffee
    • 10:30 - 10:40 Welcome and Scene Setting 10'
      Speaker: Dr. David Colling (Imperial College Sci., Tech. & Med. (GB))
      Material: Slides powerpoint file pdf file
    • 10:40 - 13:00 Big data needs of different communities
      To establish the requirements driving the later discussion.
      Convener: Dr. David Colling (Imperial College Sci., Tech. & Med. (GB))
      Material: Questionnaire Responses excel file
      • 10:40 High Energy Physics inc LHC 20'
        Speaker: Dr. Richard Philip Mount (SLAC National Accelerator Laboratory (US))
        Material: Slides powerpoint file pdf file
      • 11:00 Astronomy inc SKA 20'
        Speaker: Dr. Paul Calleja (University of Cambridge)
        Material: Slides powerpoint file pdf file
      • 11:20 Cloud computing and data intensive research 20'
        Speaker: Dr. Kenji Takeda (Microsoft Research)
      • 11:40 Earth Observation and Climate Modelling 20'
        Speaker: Phil Kershaw (CEMS)
        Material: Slides powerpoint file pdf file
      • 12:00 Weather forcasting 20'
        Speaker: Baudouin Raoult (European Centre for Medium-Range Weather Forecasts)
      • 12:20 PanData and the Research Data Alliance 20'
        Speaker: Juan Bicarregui (STFC)
        Material: Slides powerpoint file unknown type file pdf file
      • 12:40 Economic and Social Science 20'
        Speaker: Fiona Armstrong (ESRC)
        Material: Slides pdf file
    • 13:00 - 14:00 Lunch
    • 14:00 - 15:00 Big data needs of different communities
      To establish the requirements driving the later discussion.
      Convener: Jens Jensen (CLRC-RAL)
      Material: Questionnaire Responses excel file
      • 14:00 Bioinformatics 20'
        Speaker: Dr. Guy Coates (Wellcome Trust Sanger Institute)
        Material: Slides powerpoint file pdf file
      • 14:20 ELIXIR: An infrastructure for biological information in Europe 20'
        Speaker: Andrew Lyall (EMBL-EBI)
        Material: Slides powerpoint file pdf file
      • 14:40 Arts and Humanities 20'
        Speaker: Prof. Andrew Prescott (King's College London/AHRC)
        Material: Slides powerpoint file pdf file
    • 15:00 - 15:20 Tea
    • 15:20 - 17:20 Data Storage: Advanced filesystems and interfaces
      - Advances in cluster Filesystems : Lustre; Ceph; HDFS ; GPFS
      - Data access interfaces and protocols. 
      - Storage management interfaces
      - Advances in storage hardware. 
      - High-throughput storage strategies, caching,
      Convener: Shaun De Witt (Unknown)
      • 15:20 GPFS 20'
        Speaker: Vic Cornell (DataDirect Networks)
        Material: Slides powerpoint file pdf file
      • 15:40 HDFS 20'
        Speaker: Steve Loughran (Hortonworks)
        Material: Slides powerpoint file pdf file
      • 16:00 Large scale solutions with Lustre 20'
        Speaker: John Swinburne (Intel)
        Material: Slides pdf file
      • 16:20 An Intro to Ceph and Big Data 20'
        Speaker: Patrick McGarry (inktank)
        Material: Slides presentation file pdf file
      • 16:40 CERN experiences with EOS, S3 and Ceph 20'
        Speaker: Dirk Duellmann (CERN)
        Material: Slides pdf file
      • 17:00 Discussion: Filesystem needs for different communities 20'
    • 19:30 - 21:05 Dinner 1h35'
      The Workshop Dinner will be held at Med Kitchen on Gloucester Road and is kindly sponsored by DDN (http://www.ddn.com/).
      
      A map to Med Kitchen (with walking directions) can be found in the material attached to this event.
      Material: Slides pdf file
  • Friday, 28 June 2013
    • 09:00 - 11:00 Data Processing: Toolkits,Data structures, I/O optimisation
      - analysis packages and tools for data processing. 
      - data visualisation 
      - Serialisation formats 
      - Layout and access optimisations 
      - Benchmarking
      Convener: Wahid Bhimji (University of Edinburgh (GB))
      • 09:00 ROOT current architecture and plans 20'
        Speaker: Fons Rademakers (CERN)
        Material: Slides pdf file
      • 09:20 Hadoop data processing 20'
        Speaker: Steve Loughran (Hortonworks)
        Material: Slides powerpoint file pdf file
      • 09:40 Contrast between big data processing in academia and industry 15'
        Speaker: Simon Metson (Cloudant)
        Material: Slides pdf file
      • 09:55 Optimising bioinformatics pipelines for clinical genomics 20'
        Speaker: Dr. Michael Mueller (Imperial College)
      • 10:15 Marmal-aid: a tool for genomics processing 10'
        Speaker: Dr. Rob Lowe (QMUL)
        Material: Slides powerpoint file pdf file
      • 10:25 Astronomy toolkits and data structures 20'
        Speaker: Dr. Adrian Jenkins (Durham University)
        Material: Slides powerpoint file pdf file
      • 10:45 Discussion: Building on strengths of tools for all communitiies 15'
    • 11:00 - 11:30 Coffee
    • 11:30 - 12:10 Data Storage: Hardware
      - Advances in cluster Filesystems : Lustre; Ceph; HDFS ; GPFS
      - Data access interfaces and protocols. 
      - Storage management interfaces
      - Advances in storage hardware. 
      - High-throughput storage strategies, caching,
      Convener: Wahid Bhimji (University of Edinburgh (GB))
      • 11:30 Hardware for big data: lessons learned 20'
        Speaker: Marcel van Drunen (Dell)
        Material: Slides powerpoint file pdf file
      • 11:50 High performance storage solutions 20'
        Speaker: James Coomer
        Material: Slides powerpoint filedown arrow pdf filedown arrow
    • 12:10 - 13:10 Lunch
    • 13:10 - 14:55 Data Transfer: Protocols and tools
      - File transfer services: (FTS iRods...)
      - Remote access (Federated data stores...)
      Convener: Roger Jones (Lancaster University (GB))
      • 13:15 Network developments 20'
        Speaker: Paul Lewis (JANET)
        Material: Slides powerpoint file pdf file
      • 13:35 The Evolution of the FTS File Transfer Service Tool 20'
        Speaker: Michail Salichos (CERN)
        Material: Slides pdf file
      • 13:55 Federated Data Stores - Volume, Velocity & Variety 20'
        Speaker: Andrew Hanushevsky (STANFORD LINEAR ACCELERATOR CENTER)
        Material: Slides powerpoint file pdf file
      • 14:15 EUDAT technology choices 20'
        Speaker: Mark van de Sanden (SURFsara)
        Material: Slides powerpoint file pdf file
      • 14:35 Discussion: Future of data transfer 15'
    • 14:55 - 15:35 Data Management: meta-data, data discovery and preservation
      Convener: Richard Bantges
    • 15:35 - 15:45 Tea
    • 15:45 - 16:45 Data management 2: Open access and preservation
      Convener: Jens Jensen (CLRC-RAL)
      • 15:55 DOIs for tracking data 20'
        Speaker: Mr. Matthew James Viljoen (STFC - Science & Technology Facilities Council (GB))
        Material: Slides powerpoint file pdf file
      • 16:15 Digital preservation 20'
        Speaker: Jonathan Tilbury (Tessela)
        Material: Slides powerpoint file pdf file
      • 16:35 Discussion 10'
    • 16:45 - 17:00 Roundup and ways forward
      Convener: Dr. David Colling (Imperial College Sci., Tech. & Med. (GB))
      Material: slides powerpoint file pdf file