Group Meeting

Name: Group Meeting
Start: 2018-07-03T14:00:00+02:00
End: 2018-07-03T16:00:00+02:00
Location: CERN

Tuesday 3 Jul 2018, 14:00 → 16:00 Europe/Zurich

513/1-024 (CERN)

513/1-024

CERN

Show room on map

Alberto Pace (CERN), Oliver Keeble (CERN)

- 14:00 → 14:15
  
  A milestone for DPM (Disk Pool Manager) 15m
  
  The DPM (Disk Pool Manager) system is a multiprotocol scalable technology for Grid storage that supports about 130 sites
  for a total of about 90 Petabytes online.
  The system has recently completed the development phase that had been announced in the past years, which consolidates
  its core component (DOME: Disk Operations Management Engine) as a full-featured high performance engine that can also
  be operated with standard Web clients and uses a fully documented REST-based protocol.
  Together with a general improvement on performance and with a comprehensive administration command-line interface,
  this milestone also brings back features like the automatic disk server status detection and the volatile pools for deploying
  experimental disk caches.
  In this contribution we also discuss the end of support for the historical DPM components (that also include a dependency
  on the Globus toolkit), whose deployment is now only linked to the usage of the SRM protocols, hence can be uninstalled
  when these are not needed anymore by the site.
  
  Speaker: Fabrizio Furano (CERN)
  
  DPM-CHEP2018.pdf
- 14:15 → 14:30
  
  CERN Tape Archive (CTA) : From Development to Production Deployment 15m
  
  The first production version of the CERN Tape Archive (CTA) software is planned to be released for the end of 2018. CTA is designed to replace CASTOR as the CERN tape archive solution, in order to face scalability and performance challenges arriving with LHC Run-3.
  
  This contribution will describe the main commonalities and differences of CTA with CASTOR. We outline the functional enhancements and integration steps required to add the CTA tape back-end to an EOS disk storage system. We present and discuss the different deployment and migration scenarios for replacing the five CASTOR instances at CERN, including a description of how FTS will interface with EOS and CTA.
  
  Speaker: Michael Davis (CERN)
  
  CTA_CHEP.pdf
- 14:30 → 14:45
  
  Providing large-scale disk storage at CERN 15m
  
  The CERN IT Storage group operates multiple distributed storage systems and is responsible
  for the support of the infrastructure to accommodate all CERN storage requirements, from the
  physics data generated by LHC and non-LHC experiments to the personnel users’ files.
  
  EOS is now the key component of the CERN Storage strategy. It allows to operate at high incoming
  throughput for experiment data-taking while running concurrent complex production work-loads.
  This high-performance distributed storage provides now more than 250PB of raw disks and it is the
  key component behind the success of CERNBox, the CERN cloud synchronisation service which allows
  syncing and sharing files on all major mobile and desktop platforms to provide offline
  availability to any data stored in the EOS infrastructure.
  
  CERNBox recorded an exponential growth in the last couple of year in terms of files and data stored
  thanks to its increasing popularity inside CERN users community and thanks to its integration
  with a multitude of other CERN services (Batch, SWAN, Microsoft Office).
  
  In parallel CASTOR is being simplified and transitioning from an HSM into an archival system, focusing mainly
  in the long-term data recording of the primary data from the detectors, preparing the road to the next-generation
  tape archival system, CTA.
  
  The storage services at CERN cover as well the needs of the rest of our community: Ceph as data back-end for
  the CERN OpenStack infrastructure, NFS services and S3 functionality; AFS for legacy home directory filesystem
  services and its ongoing phase-out and CVMFS for software distribution.
  
  In this paper we will summarise our experience in supporting all our distributed storage system and the ongoing work
  in evolving our infrastructure, testing very-dense storage building block (nodes with more than 1PB of raw space)
  for the challenges waiting ahead.
  
  Speaker: Herve Rousseau (CERN)
  
  CHEP2018-LargeScaleStorageAtCERN.pdf
- 14:45 → 15:00
  
  Scaling the EOS namespace 15m
  
  The EOS namespace has outgrown its legacy in-memory implementation, presenting the need for an alternative solution. In response to this need we developed QuarkDB, a highly-available datastore capable of serving as the metadata backend for EOS. Even though the datastore was tailored to the needs of the namespace, its capabilities are generic.
  
  We will present the overall system design, and our efforts in providing comparable performance with the in-memory approach, both when reading, through the use of extensive caching on the MGM, and when writing through the use of latency-hiding techniques involving a persistent, back-pressured local queue for batching updates to the QuarkDB backend.
  
  We will also discuss the architectural decisions taken when designing our datastore, including the choice of consensus algorithm to maintain strong consistency between identical replicas (raft), the choice of underlying storage backend (rocksdb) and communication protocol (redis serialization protocol - RESP), as well as the overall testing strategy to ensure correctness and stability of this important infrastructure component.
  
  Speaker: Andrea Manzi (CERN)
  
  CHEP 2018 - Scaling the EOS Namespace - Draft 4.pdf
  
  CHEP 2018 - Scaling the EOS Namespace - Draft 4.pptx
- 15:00 → 15:15
  
  Testing of complex, large-scale distributed storage systems: a CERN disk storage case study 15m
  
  Complex, large-scale distributed systems are more frequently used to solve
  extraordinary computing, storage and other problems. However, the development
  of these systems usually requires working with several software components,
  maintaining and improving large codebases, and also a relatively large number
  of developers working together. Therefore, it is inevitable to introduce faults
  to the system. On the other hand, these systems often perform important if not
  crucial tasks so critical bugs, performance-hindering algorithms are not
  acceptable to reach the production state of the software and the system. Also,
  the larger number of developers can work more liberated and productively when
  they receive constant feedback that their changes are still in harmony with the
  system requirements and other people’s work which also greatly helps scaling
  out manpower, meaning that adding more developers to a project can actually
  result in more work done.
  
  In this paper we will go through the case study of EOS, the CERN disk storage
  system and introduce the methods and possibilities of how to achieve
  all-automatic regression, performance, robustness testing and continuous
  integration for such a large-scale, complex and critical system using
  container-based environments. We will also pay special attention to the details
  and challenges of testing distributed storage and file systems.
  
  Speaker: Andrea Manzi (CERN)
  
  EOS_testing.pdf
  
  EOS_testing.pptx
- 15:15 → 15:30
  
  CERNBox: the CERN Cloud Storage HUB 15m
  
  CERNBox is the CERN cloud storage hub. It allows synchronising and sharing files on all major desktop and mobile platforms (Linux, Windows, MacOSX, Android, iOS) aiming to provide universal access and offline availability to any data stored in the CERN EOS infrastructure.
  
  With more than 12000 users registered in the system, CERNBox has responded to the high demand in our diverse community to an easily and accessible cloud storage solution that also provides integration with other CERN services for big science: visualisation tools, interactive data analysis and real-time collaborative editing.
  
  Collaborative authoring of documents is now becoming standard practice with public cloud services, and within CERNBox we are looking into several options: from the collaborative editing of shared office documents with different solutions (Microsoft, OnlyOffice, Collabora) to integrating mark-down as well as LaTeX editors, to exploring the evolution of Jupyter Notebooks towards collaborative editing, where the latter leverages on the existing SWAN Physics analysis service.
  
  We report on our experience managing this technology and applicable use-cases, also in a broader scientific and research context and its future evolution with highlights on the current development status and future roadmap. In particular we will highlight the future move to an architecture based on microservices to easily adapt and evolve the service to the technology and usage evolution, notably to unify CERN home directory services.
  
  CHEP 2018 CERNBox Cloud Storage HUB.pdf
  
  CHEP 2018 CERNBox Cloud Storage HUB.pptx
- 15:30 → 15:45
  
  [WIP] Cloud Storage for data-intensive sciences in science and industry 15m
  
  In the last few years we have been seeing constant interest for technologies providing effective cloud storage for scientific use, matching the requirements of price, privacy and scientific usability. This interest is not limited to HEP and extends out to other scientific fields due to the fast data increase: for example, "big data" is a characteristic of modern genomics, energy and financial services to mention a few.
  
  The provision of cloud storage accessible via synchronisation and sharing interfaces became an essential element of services' portfolios offered by research laboratories and universities. "Dropbox-like" services were created and now support HEP and other communities in their day to day tasks. The scope for these systems is therefore much broader of HEP: we will describe the usage and the plans to adopt part of the tools originally conceived for our community in other areas. The adoption of cloud storage services in the main workflow for data analysis is the challenge we are now facing, extending out the functionality of "traditional" cloud storage.
  
  Which are the ingredients for these new classes of services? Is nowadays HEP proposing interesting solutions for other future projects on the timescale of high-luminosity LHC?
  
  The authors believe that HEP-developed technologies will constitute the backend for a new generation of services. Namely, our solution for exascale geographically distributed storage (EOS), the access and the federation of cloud storage across different domains (CERNBox) and the possibility to offer effective heavy-duty interactive data analysis services (SWAN) growing from this novel data infrastructure are the three key enablers for future evolution.
  
  In this presentation we will describe the usage of these technologies to build large content-deliver-networks (e.g. AARNET Australia), the collaboration with other activities (e.g. handling of satellite images from the Copernicus programme at JRC) and different partnerships with companies active in this field.
  
  CHEP 2018 Cloud Storage Data Intensive.pdf
  
  CHEP 2018 Cloud Storage Data Intensive.pptx
- 15:45 → 16:00
  
  CHEP Presentation - Disk failures in the EOS setup at CERN: A first systematic look at 1 year of collected data 15m
  
  The EOS deployment at CERN is a core service used for both scientific data
  processing, analysis and as back-end for general end-user storage (eg home directories/CERNBOX).
  The collected disk failure metrics over a period of 1 year from a deployment
  size of some 70k disks allows a first systematic analysis of the behaviour
  of different hard disk types for the large CERN use-cases.
  
  In this presentation we will describe the data collection and analysis,
  summarise the measured rates and compare them with other large disk
  deployments. In a second part of the presentation we will present a first
  attempt to use the collected failure and SMART metrics to develop a machine
  learning model predicting imminent failures and hence avoid service degradation
  and repair costs.
  
  Speaker: Alfonso Juan Portabales Gonzalez (Universidad Politecnica de Madrid (ES))
  
  CHEP 2018 - Disk failures in the EOS setup at CERN: A first systematic look at 1 year of collected data.odp
  
  CHEP 2018 - Disk failures in the EOS setup at CERN: A first systematic look at 1 year of collected data.pdf

Choose timezone

Group Meeting

513/1-024

CERN