Production and PASTE Meeting, 14th October 2008, CERN Chair: A.Tsaregorodtsev Minutes: S. Paterson CERN: J.Closier, Z.Mathe, S.Paterson, V.Romanovsky, R.Santinelli, A.C.Smith Phone: N.Brook, R.Graciani, M.Sapunov, A.Tsaregorodtsev Apologies: P.Charpentier 0. Ongoing Action List (07) - 27/11/07 - Zoltan, Elisa - Ongoing new bookkeeping system development (57) - 24/06/08 - Vladimir,Roberto - Automatic publishing of BDII / GOCDB information in the CS using GOCDB site names and provide a summary of recent changes (79) - 26/08/08 - Vladimir/Joel - Check twiki contents and update CS to use new host aliases (82) - 01/09/08 - Ricardo/Adri - to provide information from accounting to allow GridMap for LHCb VO T1 activity (83) - 01/09/08 - Stuart,Roberto - To perform glexec tests (86) - 09/09/08 - Andrei - Document the job finalization policy (90) - 09/09/08 - Joel - Use new ProductionOutputData JDL parameter in modules instead of creating LFNs (93) - 16/09/08 - Andrew - Improve RAW data monitoring tools and status of files in online runDB (94) - 23/09/08 - Roberto, Marcos - Using SystemLogging service for dynamic services status monitoring (95) - 23/09/08 - Roberto - SLS and SAM sensors development, upgrade SE tests with new DMS unit test (96) - 23/09/08 - Roberto, Matvey - Dashboard SAM tests visualization, porting to DIRAC framework (102) - 23/09/08 - Matvey, Adria - Documenting the DIRAC Web portal installation on volhcb11 (103) - 23/09/08 - Paul, Stuart - produce structured HTML version of the Shifter's guide (106) - 30/09/08 - Ricardo - Produce twiki documentation for dirac-install (107) - 30/09/08 - Ricardo/Philippe/Joel/Hubert/MarcoC - Produce proposal for how to perform standardised installation using install_project/CMT (108) - 14/09/08 - Roberto - Stop fake data publishing to SRMv1 endpoints and raise issue of why this is necessary (109) - 14/09/08 - Matvey - Expose data logging interface on the web page (110) - 14/09/08 - Stuart - Create guide on current understanding of how to get to 100% processing efficiency (111) - 14/09/08 - Philippe - Update file merging proposal after email discussions Completed Actions: Done - (98) - 23/09/08 - Philippe - Check SE definitions in the DIRAC2 CS for conformance with SRM v2 end-points Done - (99) - 23/09/08 - Roberto - arrange for LFC entries change directly in the ORACLE back-end 1. Action Updates - Action(07) - Prototype Bookkeeping The GUI development is ongoing and Zoltan is checking inconsistencies between BK and LFC. The Oracle backend is also being optimized. Andrei requested a migration plan for the new BK to be prepared in coordination with Philippe. There are several outstanding points such as the migration to the production Oracle service but all the points should be expressed in a list. - Action(57) - Automatic publishing of BDII / GOCDB information in the CS Vladimir had no updates to convey about this. Andrei commented about the recent renaming of some DIRAC site names and Vladimir clarified that this was performed at the request of some site administrators. It was suggested to prepare an agent to notify the SAM responsible via email of changes that should occur in the CS. - Action(82) - Provide accounting information to allow GridMap for LHCb VO T1 activity Ricardo mentioned that reporting accounting information over the past hour gives an unclear picture in some cases so this should be moved to the WMS. Andrei suggested reformulating the action point and will follow this with Adri. - Action(83) - gLexec tests Stuart had no time to do this so far but intends to make a start on it this week using the new agent mode of submission to facilitate easy testing. - Action(86) - Document job finalization policy Ongoing. - Action(90) - Production Output Data parameter Ongoing. Joel will convey any missing parameters to Stuart to allow the LFNs to be used in one place. - Action(93) - RAW data monitoring Proxy portal is required to convey messages from the pit, this is ongoing. - Action(94) - SystemLogging Marcos is not receiving any clear errors that can be used and Roberto reported that this will be checked over time. It was suggested that the web page displays more of the 'variable' parts of the error messages in order to better understand the system behaviour. Andrei asked Roberto to get in touch with Marcos with regard to a threshold level of error messages that could flag problems in SLS / SAM. - Action(95) - SLS / SAM DIRAC SLS service monitoring has been improved over the past weeks. Roberto has split central DIRAC and T-1 VO-box services and changed the criticality accordingly. CondDB SAM test is ongoing. Roberto mentioned that fake data is being reported about SRMv1 endpoints, after some discussion it was decided to drop the publishing of this information and raise the issue of why it was necessary. ( ACTION: Roberto ) Andrew reported that a unit test for LHCb SRMv2 operations was available and Roberto agreed to update the sensors. - Action(96) - Dashboard Porting to the DIRAC3 framework of Dashboard tests is pending some selection features from the Dashboard developers. - Action(102) - Documenting the DIRAC Web portal installation on volhcb11 Ongoing, will be linked to the production procedures page. - Action(103) - produce structured HTML version of the Shifter's guide Stuart commented that the HTML is available but requires somewhere to be hosted. Andrei suggested volhcb06 for now in the storage volume. - Action(106) - Produce twiki documentation for dirac-install Logic and usage pages should be made available on the DIRAC3 and Production Procedures twiki pages. - Action(107) - Produce proposal for how to perform standardised installation using install_project/CMT. Ongoing. Data logging via the web page should be exposed. ( ACTION: Matvey ) 2. Production Status Report - Report from the Production meeting Stuart reported that the recent lumi10 productions (3064, 3065) were complete but tools to easily validate a production should be developed. In particular there seem to be a large number of Completed jobs that don't seem to be moving to the Done state (BK registrations). One proposal would be that the Production Manager sets the production to 'Validating' and an agent should provide a report on the production including jobs, launching file consistency checks etc. then finally set the status to 'Done' if ok. It would be very useful to run a full test on a reasonable sample of files to develop tools to arrive at 100% processed files. This would be most advantageous during a DIRAC3 week, Andrei suggested Stuart should compile a loose guide for how to proceed in this task. ( ACTION: Stuart ) - DIRAC releases Andrei summarized that v3r4 and v3r5 are released (reports are in the ELOG). v4r0 is under preparation with the new binaries being compiled by Ricardo. - SRM v1->v2 migration The migration is complete and most site v1 endpoints have already been retired. 3. CREAM CE status update There was some discussion regarding the CREAM CE status. CREAM will be deployed over the coming months and it was agreed to pursue testing with a low priority on that timescale. 4. File merging proposal Philippe's proposal is linked from the agenda page, it was agreed to trigger the update of the existing proposal with the discussions on the mailing lists. ( ACTION: Philippe ) 5. DIRAC3 Tasks - Security Logging The security events for interactions with DIRAC3 services should be logged and persisted for a long time, this service facilitates that. Ricardo suggested having a single instance for production and development systems (volhcb06 was suggested). 6. AOB Joel raised the point that CHEP abstracts should be prepared by mid-November, Andrei agreed to circulate a proposal for the topics.