Database Futures Workshop

Europe/Zurich
IT Auditorium (CERN)

IT Auditorium

CERN

Description
The aim of the workshop is to discuss possible future needs in the database area. In particular, we would like to understand likely future database applications from the different user communities (experiments, accelerator controls, engineering support, IT, AIS, …) and how these might map to different database technologies (relational/nosql) and implementations (oracle/mysql and then pick freely from http://en.wikipedia.org/wiki/NoSQL).
Participants
  • Alessandro Cavalli
  • Alexander Kalkhof
  • Alexander Loth
  • Andre Regelbrugge
  • Andrea Valassi
  • Andreas Motzke
  • Andreas Pfeiffer
  • Andrei Dumitru
  • Andres Pacheco Pages
  • Andreu Belmonte Pena
  • Anna Ksyta
  • Annika Nordt
  • Anton Topurov
  • Antonio Pierro
  • Barbara Martelli
  • Carlos Fernando Gamboa
  • Carlos Garcia Fernandez
  • Chatal Francois
  • Daniel Abler
  • Dario Barberis
  • Dave Dykstra
  • David Tuckett
  • Dirk Jahnke-Zumbusch
  • Dmitry Ustyushkin
  • Edward Karavakis
  • Elena Planas
  • Elisabeth Vinek
  • Eric Grancher
  • Eva Dafonte Perez
  • Fabian Lambert
  • Fabio Souto Moure
  • Faustin Laurentiu Roman
  • Frank Glege
  • Gancho Dimitrov
  • Giacomo Govi
  • Graeme Andrew Stewart
  • Greg Doherty
  • Hans von der Schmitt
  • Hironori Ito
  • Ignacio Coterillo
  • Illya Shapoval
  • Isabelle Laugier
  • Ivan Fedorko
  • Jacek Wojcieszuk
  • Jerome Belleman
  • jerome fulachier
  • John Gordon
  • Jose Carlos Luna Duran
  • Julius Hrivnac
  • Kajetan Fuchsberger
  • Kamil Wisniewski
  • Kate Dziedziniewicz-Wojcik
  • Keir Hawker
  • Luc Goossens
  • Luca Canali
  • Manuel Gonzalez Berges
  • Marcin Blaszczyk
  • Marcin Bogusz
  • Marco Clemencic
  • Mariusz Piorkowski
  • Mateusz Lechman
  • Mattia Cinquilli
  • Maxim Potekhin
  • Michael Dahlinger
  • Michal Nowotka
  • Nilo Segura Chinchilla
  • Omar Pera Mira
  • Osman AIDEL
  • Peter Chochula
  • Peter Malzacher
  • Philippe Cheynet
  • Piotr Golonka
  • Raffaello Trentadue
  • Roderick Chris
  • Roman Sorokoletov
  • Ronny Billen
  • Salvatore Di Guida
  • Simone Campana
  • Stefan Roiser
  • Sylvain Chapeland
  • Szymon Skorupinski
  • Tim Bell
  • Tony Cass
  • Tony Wildish
  • Valentin Kuznetsov
  • Vaniachine Alexandre
  • Vasco Chibante Barroso
  • Vincent Bernardoff
  • Zoltan Mathe
  • Zornitsa Zaharieva
    • 10:30
      Coffee IT Auditorium

      IT Auditorium

      CERN

    • 1
      Welcome & Introduction IT Auditorium

      IT Auditorium

      CERN

      Slides
    • Requirements I IT Auditorium

      IT Auditorium

      CERN

      Conveners: Katarzyna Maria Dziedziniewicz (CERN), Tony Cass (CERN)
      • 2
        Database services for ALICE Detector Control System
        We describe the architecture and implementation of the ALICE DCS database service. The whole dataflow from devices to the ORACLE database as well as the interface to online and offline data consumers is briefly overviewed. The operational experience with the present configuration as well as future plans and requirements are summarized in this talk.
        Speaker: Peter Chochula (CERN)
        Slides
      • 3
        Use of MySQL in the ALICE data-acquisition system.
        MySQL has been in use to store and access structured information for the ALICE data-acquisition since 2004. It copes with the implementation of 9 distinct data repositories (configurations, logs, etc) for the online subsystems implemented at the experimental area, all of them having different I/O patterns and requirements. We will review the architecture, performance, features, and future needs of our online database systems. Feedback about our positive experience with this tool will be given.
        Speaker: Mr Sylvain Chapeland (CERN)
        Slides
      • 4
        Database choices for CERN Drupal service
        CERN is deploying a new content management approach based on Drupal (http://drupal.org) for the main www.cern.ch site, departments and experiments. This talk will review the requirements and options for the database part of the deployment to create an infrastructure capable of supporting millions of hits per day.
        Speaker: Tim Bell (CERN)
        Slides
    • 12:30
      Lunch Restaurant, cafe or sandwich bar (Somewhere)

      Restaurant, cafe or sandwich bar

      Somewhere

    • Requirements II IT Auditorium

      IT Auditorium

      CERN

      Conveners: Katarzyna Maria Dziedziniewicz (CERN), Tony Cass (CERN)
      • 5
        Overview of Data Management solutions for the Control and Operation of the CERN Accelerators
        The control and operation of the CERN accelerator complex is fully based on data-driven applications. The data foundation models the complex reality, necessary for the configuration of the accelerators controls systems and is used in an online and dynamic way to drive the particle beams and surrounding installations. Integrity of the data and performance of the data-interacting applications are key requirements and challenges that have been satisfied. This presentation will give an overview of what is currently in production, from the mission-critical data (controls configuration, operational settings, alarms, logging,...) to the closely related offline information (layout, equipment details,...) and the need that all of this has to fit together (relationally). Figures of complexity and performance will be given, also indicating the means that we have put in place to monitor, diagnose and track the usage of our data management services.
        Speakers: Mr Chris Roderick (CERN), Ms Zory Zaharieva (CERN)
      • 6
        Future Database Requirements in the Accelerator Sector
        Since more than two decades, relational database design and implementations have been satisfying data management needs in the CERN Accelerator Sector. The requirements always covered a wide range of functional domains from complex controls systems configuration data to the tracking of high-volume data acquisitions. The requirements to store large data sets have increased by several orders of magnitude between the consecutive epochs from SPS to LEP to LHC. So far, scalability has been ensured by following the hardware and software technology. Looking ahead –towards CLIC for example-, will we still be able to continue the same route or will this strategy fail eventually? This presentation will outline some of these issues in the different domains and raises the questions that have to be addressed in due time.
        Speaker: Mr Ronny Billen (CERN)
        Slides
      • 7
        Administrative & Engineering Requirements
        We present the range of Administrative and Engineering applications together with expectations for future developments, growth and requirements.
        Speakers: Christophe Delamare (CERN), Derek Mathieson (CERN)
    • 15:30
      Coffee IT Auditorium

      IT Auditorium

      CERN

    • Requirements III IT Auditorium

      IT Auditorium

      CERN

      Conveners: Katarzyna Maria Dziedziniewicz (CERN), Tony Cass (CERN)
      • 8
        Experience with the CMS online DB and prospects for the future
        CMS has chosen to use an online DB located at IP5 both for security reasons and to be able to take data even without GPN connection. The online DB (OMDS) is accessed by various applications for data acquisition configuration (through OCI libraries via TStore), detector slow control (via PVSS) and monitoring via java or c++ libraries. It also contains offline conditions data which are needed for high level trigger system which is running a simplified version of the event reconstruction program on a cluster of few hundreds of machines. A caching system based on Frontier allows to reduce the load on the DB for this application similarly to what is used in the offline DB. A web based monitoring allows to display the run list and most of the monitoring information. This tool makes use of caches in order to reduce the load on the DB. Many other applications rely on the DB: storage manager, elog, access control packages. Streaming is used to duplicate data for analysis access via lxplus for the detector experts. So far the OMDS has collected abut 1.5 TB of data per year. Heavy use of the query optimization through appropriate indeces and partitioning is used in the largest accounts. Partitioning will allow archiving of old data if space limitation or performance become an issue. The experience with the online DB during 2010 data-taking is discussed and prospects for the future.
        Speaker: Frank Glege (CERN)
        Slides
      • 9
        CMS experience with offline Conditions Database and prospects for the future
        CMS experiment is made of many detectors which in total sum up to 60 million channels. Calibrations and alignments are fundamental to maintain the design performance of the experiment. The conditions database contains the alignment and calibrations data for the various detectors. Conditions data sets are accessed by a tag and an interval of validity through the offline reconstruction program CMSSW, written in C++. Performant access to the conditions data as C++ objects is a key requirement for the reconstruction and data analysis. About 200 types of calibration and alignment exist for the various CMS sub-detectors. Each set is grouped in a so-called "global tag" which is valid for a given period of data-taking and for a given data set (Monte Carlo events or collisions data). Only those data which are crucial for reconstruction are inserted into the offline conditions DB. This guarantees a fast access to conditions during reconstruction and a small size of the conditions DB. The talk describes the experience with the offline reconstruction conditions database during 2010 and prospects for the future.
        Speaker: Giacomo Govi (Fermilab)
        Slides
      • 10
        Future Oracle use by CMS offline dataflow and workflow management
        We describe the current use of Oracle by CMS offline dataflow and workflow components (T0, PhEDEx, DBS). We consider how the database use is expected to evolve over the next few years, in terms of both data-volume, data-structure and application-use
        Speaker: Mr Tony Wildish (PRINCETON)
        Slides
    • 18:00
      Aperitif Restaurant 2

      Restaurant 2

      CERN

    • Implementations I IT Auditorium

      IT Auditorium

      CERN

      Conveners: Francois Chatal (CERN), Tony Cass (CERN)
      • 11
        LHCb Databases - Present and Future
        Several database applications are used by the LHCb collaboration to help and organize the day-to-day tasks, to assist the data taking, processing and analysis. I will present a brief overview of the technologies used and the requirements for the long term support of both the current database applications and the possible future ones.
        Speaker: Marco Clemencic (CERN PH-LBC)
        Slides
      • 12
        Evolution of Databases for the ATLAS experiment
        The use of databases in ATLAS is going through a continuous process of development, deployment and optimisation, in order to cope with the increasing amounts of data and new demands from the user community. In 2011 and 2012 work will concentrate on two major lines, namely the transition to Oracle 11g and re-optimisation of the existing database in Oracle, and the study of new technologies (NoSQL databases) for specific applications. This talk will give an overview of these activities and an introduction to specific talks.
        Speaker: Dr Dario Barberis (CERN)
        Slides
      • 13
        Enhance the ATLAS database applications by using the new Oracle 11g features
        It is planned that in the beginning of 2012 all ATLAS databases at CERN will be upgraded to the Oracle 11g Release 2. In the light of making the ATLAS DB applications more reliable and performant, we would like to explore and evaluate the new 11g database features for development and performance tuning. In the talk will be described the expected benefits of having some of the Oracle 11g enhancements in place and typical ATLAS use cases for which they would suit best.
        Speaker: Gancho Dimitrov (BNL)
        Slides
    • 10:30
      Coffee IT Auditorium

      IT Auditorium

      CERN

    • Implementations II IT Auditorium

      IT Auditorium

      CERN

      Conveners: Francois Chatal (CERN), Tony Cass (CERN)
      • 14
        Databases for the ATLAS Detector Control System: experience and future requirements
        The ATLAS detector control system (DCS) archives detector conditions data in a dedicated Oracle database using a proprietary schema (PVSS Oracle archive) and representing one of the main users of the ATLAS online database service. The contribution will give an overview about the database usage and operation experience, e.g. with respect to data volume, insert rates, and pending issues. Constraints and ideas for future requirements in the view of experiment operation and upgrades are discussed.
        Speaker: Dr Stefan Schlenker (CERN)
        Slides
      • 15
        Development of a noSQL storage solution for the Panda Monitoring System
        For the past few years, Panda Workload Management System has been the mainstay of computing power for ATLAS experiment at the LHC. Since the start of data taking, Panda usage gradually ramped up to 840,000 jobs processed daily in the Fall of 2010, and remains at consistently high levels ever since. Given the upward trend in workload and associated monitoring data volume, the Panda team is facing a new set of challenges in the areas of database scalability and efficiency of its monitoring system. These challenges are being met with a R&D effort aimed at implementing a scalable and efficient monitoring data storage based on a noSQL solution (Cassandra). We present our motivations for using this technology, as well as data design and the techniques for efficient indexing of the specific data, which have been tested in two different hardware configurations.
        Speaker: Dr Maxim Potekhin (Brookhaven National Laboratory (BNL))
        Slides
      • 16
        ATLAS DDM/DQ2 & NoSQL databases: Use cases and experiences
        The Distributed Data Management System DQ2 is responsible for the global management of petabytes of ATLAS physics data. DQ2 has a critical dependency on Relational Database Management Systems (RDBMS), like Oracle, as RDBMS are well-suited to enforce data integrity in online transaction processing application. Despite these advantages, concerns have been raised recently on the scalability of data warehouse-like workload against the transactional schema, in particular for the analysis of archived data or the aggregation of data for summary purposes. Therefore, we have considered new approaches of handling vast amount of data. More specifically, we investigated a new class of database technologies commonly referred to as NoSQL databases. This includes distributed filesystem like HDFS that support parallel execution of computational tasks on distributed data, as well as schema-less approaches via key-value stores, like HBase, Cassandra or MongoDB. These databases provide solutions to particular types of problems: for example, NoSQL databases have demonstrated that they can scale horizontally, deliver high throughput, have automatic fail-over mechanisms, and provide easy replication support over LAN and WAN. In this talk, we will describe our use cases in ATLAS, and share our experiences with NoSQL databases in a comparative study with Oracle.
        Speaker: Dr Vincent Garonne (Conseil Europeen Recherche Nucl. (CERN)-Unknown-Unknown)
        Slides
    • 12:30
      Lunch
    • Technologies I IT Auditorium

      IT Auditorium

      CERN

      Conveners: Ignacio Coterillo (CERN), Tony Cass (CERN)
      • 17
        CMS requirements on CERN/IT provided NoSQL data stores
        We discuss potential future requirements for CERN/IT managed/provided "NoSQL" data stores, and provide some high level observations based on our experiences with these technologies.
        Speaker: Simon Metson (H.H. Wills Physics Laboratory)
        Slides
      • 18
        CMS Offline experiences with NoSQL data stores
        The CMS Offline project has been developing against "NoSQL" data stores since 2009 and have experience with three projects in particular; CouchDB, Kyoto Cabinet and MongoDB. We present how these tools are used in our software, why they were chosen and lessons we've learnt along the way."
        Speaker: valentin kuznetsov (cornell)
        Slides
      • 19
        NoSQL Databases and Monitoring
        Monitoring typically requires to store large amounts of metric samples which are recorded at a high rate. These samples must then be massively read back and reprocessed for analysis and visualisation purposes. For the past few years, different monitoring systems have been developed on top of NoSQL databases for the scalability they provide. Likewise, a monitoring system for the batch service is currently being developed at CERN and the use of a NoSQL database as one of its components is under investigation. This talk describes to what extent NoSQL databases are suitable for working with monitoring information, as opposed to SQL databases.
        Speaker: Mr Jerome Belleman (CERN)
        Slides
    • 15:30
      Coffee IT Auditorium

      IT Auditorium

      CERN

    • Technologies II IT Auditorium

      IT Auditorium

      CERN

      Conveners: Ignacio Coterillo (CERN), Tony Cass (CERN)
      • 20
        Future plans for CORAL and COOL
        This presentation will report on the current plans for the future meintenance and development of two Persistency Framework packages used by several LHC experiments for accessing Oracle databases: CORAL (the generic RDBMS access layer, used by ATLAS, CMS and LHCb) and COOL (the conditions database package used by ATLAS and LHCb). It will also cover the status and plans for the CORAL Server, the middle tier technology (similar to Frontier/Squid) used by ATLAS online for the configuration of the High Level Trigger.
        Speaker: Dr Andrea Valassi (CERN)
        Slides
      • 21
        Frontier and HTTP caching in the future
        Frontier has been successfully distributing high-volume, high throughput, and long-distance data for CMS for many years and more recently for ATLAS, greatly reducing the expectations on the WLCG database servers. This talk will briefly describe the present status and cover the expected changes coming in the future. No major changes are foreseen, but improvements in robustness and security, and increased numbers of user projects are expected. Even more applications are expected for HTTP caches, which will require enhancements to the configuration and monitoring of the WLCG squid network.
        Speaker: Dr Dave Dykstra (Fermilab)
        Slides
    • Where Next? IT Auditorium

      IT Auditorium

      CERN

      • 22
        Qserv: Distributed Shared-nothing from MySQL and Xrootd
        The LSST catalog of celestial objects will need to answer both simple and complex queries over many billions of rows. Since no existing open-source database efficiently supports its requirements, we have are developing Qserv, a prototype database-style system, to handle such volumes. Qserv uses Xrootd as a framework for data-addressed communication to a cluster of machines with standalone MySQL instances. Xrootd provides fault-tolerance and replication support, while MySQL provides a basic SQL execution engine. Using a spatial spherical partitioning approach, Qserv fragments queries and aggregates results scalably even for expensive spatial self-joins.
        Speaker: Daniel Wang (SLAC National Accelerator Laboratory)
        Slides
      • 23
        Rapid Summary
        Speaker: Tony Cass (CERN)
        Slides