Production and PASTE Meeting, 15th May 2007, CERN
Chair: A.Tsaregorodtsev
Minutes: S. Paterson
CERN: M.Bargiotti, A.Casajus, G.Castellani, J.Closier, M.Dimou, R.Graciani, A.Smith, S.Paterson, R.Santinelli, A.Tsaregorodtsev
Phone: R.Bernet, N.Brook, U.Marconi, R.Nandakumar
Apologies: P.Charpentier
1. LHCb VO User Support ( M.Dimou )
Maria wished to clarify the LHCb procedure for the submission of GGUS tickets. LHCb users currently contact internal
mailing lists for Grid support and experts then submit GGUS tickets if required. It was requested that older tickets
should be cleaned up but this was not possible due to not having GGUS admin status. To facilitate this it was agreed
to set up a VOMS group for LHCb VO support. Maria also highlighted the VO support pages at this link.
2. Production Status Report
The production meeting was cancelled this week due to the DIRAC3 discussions. Minutes from the last production
meeting are available here. Joel reported that no MC simulation was running. Nick mentioned that two new
sites were available (CNAF and Russian based Tier-2's) and the usage of these new resources should be
maximised. The new production requests will be submitted pending the new version of Gauss (2.5M events are to be
produced).
A summary of the reconstruction status by site is below:
- CERN & IN2P3 OK
- PIC some hardware problems (tape robot)
- CNAF no jobs eligible
- GRIDKA & RAL had many stalled jobs
- NIKHEF a GGUS ticket will be submitted for file upload problems to the NIKHEF-tape SE ( ACTION: Joel ).
The DC06 cleanup is ongoing, the Tier-1 disks are still to be tidied but Marianne is preparing the lists for this ( ACTION:
Marianne ). Joel reported that Gennady had provided a script to target the destination site for reconstruction jobs
which is now in use.
3. Production Operations Issues
The issue of SRM stability at the Tier-1s was mentioned and Nick agreed to escalate this on a site by site basis after
some definite information had been gathered. Stuart reported that the StagerAgent is now populating the StagerDB
with all site metadata timing information and this should be represented graphically somehow ( ACTION: Andrew,
Stuart ). Roberto agreed to report to add a metadata request to the SRM critical tests and flag with an alarm. In this
way, sites would not be declared 'up' unless their SRM was responsive. The possibility to include VO critical tests in
the evaluation of site availability should also be discussed at the operations meeting ( ACTION: Roberto ).
The issue of some user jobs incorrectly staging data at one site and running at another site is fixed at the time of
writing these minutes. Now the Data Optimizer will force a single destination site for all jobs ensuring data cannot be
staged twice unnecessarily. There was also some discussion of the recent User system Stager activity. It was agreed
that jobs accessing files on tape should be restricted somehow. Stuart will work on this and draft an email to explain
the policy. ( ACTION: Stuart )
4. Definition of the LHCb VO Policies
A further discussion on the LHCb VOMS policy was started, it was concluded that all LHCb users will be mapped to the
LHCb group by default and the VO will retain specialized groups such as production managers. There was a
comprehensive discussion of the job prioritization strategies in DIRAC and Gianluca agreed to make a summary of this
( ACTION: Gianluca ).
5. DIRAC3 Tasks
The DIRAC3 Tasks discussion at the PASTE meeting was postponed due to the DIRAC3 workshop (being held
concurrently).
6. AOB
None