FORUM ON INTERFACING TO THE LHC LOGGING DATABASE FOR DATA ANALYSIS
==================================================================
which took place on 15 March 2010, at CERN. 
See http://indico.cern.ch/conferenceDisplay.py?confId=87367

Summary Notes:

( LDB = LHC Logging DB, MDB = LHC Measurement DB )
 
The meeting concentrated on 
- specifying what is needed by the users, on the basis of 
  concrete example use cases
- illustrating the use cases with already existing tools
- reminding the scope and current capabilities of the LDB/MDB
  and interface

BE-BI:
- some tools and API to MDB/LDB developed:
  * two developments:
    1. API in Root/SQL  
       NB: DM team prefers Java for security and performance reasons
       Tools in Python/Root
    2. API in Java 
       Using Mathematica for analysis
- Need access to several DB (Layout, MTF, MDB, LDB, LSA, PM...)
  * combine information
- tools/interface must be easy to use and modify
- BLM: the purpose is large scale analysis
- It was mentioned that another development has been made
  in Java (Mario Terra Pinheiro Fernandes Pereira) which combines 
  information from LDB and layout DB.
- It was also mentioned that scripts will need to be developped
  for regular performance reports of LHC operation that will retrieve
  data from the LDB.

LHC experiments:
- DIP is used to obtain online various data from the LHC
- Although much of this data is archived by the experiments,
  DIP does not provide the solution for offline analysis.
  * DIP uptime <100%, 
  * reprocessing and correcting data by the LHC experts
  * possibility that not enough information was exchanged
    over DIP (adding new variables with experience)
  => LHC experiments all need to have access to the LDB and MDB
- Most relevant variables for the experiments:
 (as of today's knowledge)
  * bunch/beam intensities, beam losses, beam positions,
    beam sizes (emittances), collimator positions, 
    some vacuum gauges.
  * operational parameters, like SMP flags, beam modes, PM info,
    fill number, etc.
  * but also sporadically-measured quantities such as:
    crossing angles, beta functions.
- The experiments would very much support the idea that
  LHC data could be retrieved from the LDB, reprocessed, 
  corrected and stored back into the same DB, using e.g.
  versioning (original data are never removed). 
  * older versions data can be retrieved
  * by default, the most recent data are accessed (last version)
  * version number of data can be queried
  * versioning allows, among others, proper scientific referencing
  * use a different DB (with same interface) for corrected data ?
- programmatic MDB/LDB data retrieval is needed (API)
  * possibility to make value-based queries over restricted time
    ranges (typically minutes), e.g. specify a time range and a value 
    threshold and search DB.
    o Find channels with value-based conditional queries
    o Specify time and channel ranges
  * filter data of a channel according to value and specified range
  * more complex filtering ?
  * possibility to filter an array according to index (like bunch
    charge in slot "j")
  * time alignment of data from different variables by interpolation ?
- Also access to measured and/or expected machine lattice would
  be useful (MAD online?). Via the same API ?
- Access must be possible from the GPN and TN
- API for LDB/MDB should be identical (like in TIMBER, DB is
  selectable)
- Python or C++ interface language would be preferred, but Java
  can be interfaced to as well using Python or C++
- Note: data analysis is often performed several months after 
  the measurements, but also a few minutes after the
  measurements (quasi-online analysis on specific events).

Data Management team:
- Direct database access must be avoided
  * Not scalable across all clients
    o Number of connections
    o Security considerations
    o Volatile infrastructure
  * Not secure
    o Badly written queries / application logic will crash 
      the entire service!
  * Not performant
    o Most programming languages provide database access
    o Few languages optimized to work with Oracle in a  
      performant manner
- Java API to the Logging Service is available since several 
  years. 
  * Well documented, see
    http://slwww.cern.ch/~pcrops/releaseinfo/pcropsdist/dm/logging-data-extractor-client/PRO/build/docs/api/
  * Easy to use
    o provides time alignment and some filtering functionality
  * Sample code available
  * Heavily used (> 30 custom applications + TIMBER)
  * Fully optimized and instrumented, essential for us to monitor 
    and guarantee the Service.
  * Provides secure access to databases hidden on Technical Network
- JDBC fulfils our requirements, particularly with respect to Oracle
  * performance, as it supports:
  * Connection Pooling
  * Statement Caching
  * Bind Variables
  * Flexible Array Fetching
- 3-Tier architecture has many more benefits
  * Resource pooling (connections, statements) 
  * Database protection
  * Database isolation, since users don.t need to care about:
    o Database schema
    o Server details and login credentials
    o Access to Technical Network
- Note: MDB/LDB can be accessed from TN and GPN
- Logging DB, now taking ~ 100 GB/day


How to continue ?
-----------------

Here a proposal.

The meeting focused on two aspects:
1) Data retrieval and 2) data analysis.

1) For data retrieval:
- users should try the currently available Java API and feed back
  to the DM team (who also offers support for users and tries
  to implement requested changes, if possible)
- bridging from preferred language should be explored/prototyped
  by the users,
- a user e-group to communicate about these developments will be 
  created by the DM team and announced to the same mailing list
  as used here. Please, subscribe and advertise.
2) For data analysis:
- The users are the primary developpers of data analysis tools
- The same e-group can be used for communication about toolkit developments 
- DM team offers help for management of s/w developments (toolkits,
  language bindings, ...). The proposal is to use for example CERN SVN 
  (SVN=SubVersioNing).
- A working group (with a reduced number of participants) will be started
  to review and coordinate these developments.