dCache T1 data-management workshop

Europe/Zurich
FTU FZK (FZK)

FTU FZK

FZK

Karlsruhe
Jos Van Wezel
Description
Technical exchange on improving stability and reliability of the dCache data management system at WLCG T1 centers
Minutes
more information
Participants
  • Artem Trunov
  • Christopher Jung
  • Doris Ressmann
  • Erik Mattias Wadenstein
  • Flavia Donno
  • Gerard Bernabeu Altayó
  • Gerd Behrmann
  • Giacinto Donvito
  • Hironori Ito
  • Jamie Shiers
  • Jon Bakken
  • Jonathan Schaeffer
  • Jos VanWezel
  • Lionel Schwarz
  • Onno Zweers
  • Paco Martinez
  • Patrick Fuhrmann
  • Pedro Salgado
  • Reda Tafirout
  • Silke Halstenberg
  • Simon Liu
  • Trompert Ron
  • Xavier Mol
  • Wednesday, 14 January
    • 09:00 09:15
      Welcome 15m FTU FZK

      FTU FZK

      FZK

      Karlsruhe
      Opening and goals of the workshop. Motto: yes we can, stability is coming to dCache.
      Speaker: Jos van Wezel
      Slides
    • 09:20 13:30
      dCache T1 Administrators FTU room 156

      FTU room 156

      FZK

      Karlsruhe

      This is a closed session slot for admins only

      • 09:20
        dCache T1 Administrators union 1h
        • Mutual introduction. Whos who in dcache land?
        • reflection of current practice of information exchange between T1 admins
        • improve stability of operations. what are the 5 most seen issues and how are they impacting stability.
        • T1 strategy for the coming months. What can we sustain with the current tool set.
        • Are we prepared to update dCache or to change the setup? Should we have a fixed maintenance cycle? Can we improve pre-testing of the experiments DDM systems at the site?
      • 10:20
        Coffee break 15m
      • 10:35
        dCache T1 Administrators union cont. 30m
        • Future meetings? Necessity? Frequency?
        • Do we know what our customers want? Do we know what the experiments want to see improved at least before data taking?
        • Dealing with experiment requests
        • how EGEE and OSG differ
      • 11:05
        Configuration and tune-ups at the sites 30m
        Presentation/discussion of (site specific) configurations for logging, monitoring, failover/redundancy, scalability. <lu>
      • What is special in your setup that you suggest others sites could benefit from.
      • What are your numbers for the JVM, for the movers per pool, for the size per disk etc.

        From this slot we move into generic configurations starting with database setup and tuning.

      Speaker: Mr Artem Trunov
  • 12:00
    Lunch 1h 30m
  • 13:30 19:30
    T1 Administrators technical session part 1: administration experiences and issues FTU FZK

    FTU FZK

    FZK

    Karlsruhe

    The administrators sessions are forum discussions chaired by one of the T1 administrators who may start the forum with a short presentation that for example may include a list of sub topics to discuss, questions that live among the T1s admins or explains his/her own experience with the specific topic. The slot times are merely guidelines.

    • 13:30
      Postgres and Databases jamboree 1h
      # Backups and restore # Best practices * Vacuum & Analyze # High Availability * Warm standby * Slony # Partitioning * Case: the billingDB at PIC
      Speakers: Francisco Martinez, Gerard Bernabeu
    • 14:30
      Coffee break 15m
    • 14:45
      HSM: Moving data to tape and back? 1h 45m
      Tape experience is building up and we need to synchronize.
      • What are the problems at the different sites and how are they dealing with it.
      • Will the new HSM interface improve things?
      • There are very different tape handling systems in use at the T1s. What are the highlights of these, what are the dark sides.
      • How to prevent and deal with missing data?
      • What is the experience with optimization of recalls and migrates?
      • What throughput do you get: reading/writing. What do the experiments expect? What throughput do you expect?
      • Are the Tim Bell metrics useful? What do they show for each site? https://twiki.cern.ch/twiki/bin/view/LCG/MssEfficiency
      Speakers: Jonathan Schaeffer (CC-IN2P3), Ron Trompert (SARA), Simon Liu (Triumf)
    • 16:30
      Data transfers 30m
      Presentation how data transfer was/is/can/should/must be done with FTS/SRM/sth. else between X and Y.
      Speaker: n.n
  • Thursday, 15 January
    • 09:00 13:00
      T1 Administrators technical session part 2: upcoming administration tasks FTU FZK

      FTU FZK

      FZK

      Karlsruhe
      • 09:30
        gPlazma and VOMS 30m
        Speaker: Dr Silke Halstenberg
      • 10:00
        Chimera, ready for T1 deployment 30m
        Speaker: Erik Mattias Wadenstein (Unknown)
      • 10:45
        Coffee break 15m
      • 11:00
        Summary and conclusions of the administrators sessions 30m
        Speaker: Doris Ressmann (Unknown)
      • 11:30
        Optimisation of dCache 30m
      • 12:00
        Lunch 1h
    • 13:30 16:30
      All hands: sustaining dCache FTU room 156

      FTU room 156

      FZK

      Karlsruhe

      Administrators, developers and supporters of dCache and WLCG and experiment representatives discuss stability and reliability of dat management and dCahe in particular. We try to separate dCache issues from those that must be dealt with elsewhere. GridFTP, FTS, LFC etc are interacting with dCache and have impact on stability.

      • 13:30
        Summary of experiment open issues with data management and dCache 30m
        What are the current showstoppers and must haves for reliable data management according to the LHC experiments. Realize these are moving targets!
        Speakers: Dr Flavia Donno (CERN), Dr Giacinto Donvito (INFN-Bari)
        Slides
      • 14:00
        dCache T1 baseline services for LHC data taking 1h
        WLCG status of data management: downsides, upsides, management suggestions for improvement, relationship with other services (FTS, LFC, OPN etc)
        Speaker: Dr Jamie Shiers (CERN)
        Slides
      • 15:00
        dCache status, developments and plans 1h
        • What seem to be the recurring issues at site XXX.
        • What can be don to improve error tracking
        • What could dCache.org do to help.
        • Where can the T1s assist in getting problems cornered and fixed.
        • Plans to fix known bugs/problems and improve/enhance the dCache categorized into 3 groups: soon, next year, never. Includes a vision for Internet data storage and management with dCache.
        • Ticket statistics. What needed a lot of attention? What happens always on site X but strangely never at site Y. Did the information you, the admin, provided include the facts dcache.org needed to solve the ticket?
        • Was your ticket answered effectively?
        • Are the current support and contact methods optimal?
        • Future relevant meetings
        Speaker: Dr Patrick Fuhrmann (DESY)
      • 16:00
        Conclusions 20m
        Speaker: Jos van Wezel