Subject: Summary of LCG-2 RLS/replica manager/SE status From: "Ian Bird" Date: Thu, 5 Feb 2004 16:35:39 +0100 To: Dear colleagues, Below is a summary of the status of the problems with the replica manager and SE that we discussed at the GDA meeting on Monday. We will update the status on Monday. Ian Status of LCG-2 Data Management - 5 February 2004 ------------------------------- This note presents the status of the Data management components of the LCG-2 release. Replica Manager & RLS --------------------- 1) The naming problem: The EDG Replica Manager writes entries in the RLS for LFNs, GUIDs and SFNs with a prefix as follows: LFN - lfn:this-is-my-logical-file-name GUID - guid:73e16e74-26b0-11d7-b1e0-c5c68d88236a SFN - srm://lxshare0384.cern.ch//flatfiles/cms/data/05/x.dat POOL stores the same entries without the prefix, and additionally can store entries in a format accepted by ROOT, which in certain cases does include a protocol prefix (e.g. rfio://... or dcap://....). Currently POOL does not understand the "srm:" syntax. Given these inconsistencies, an entry inserted by POOL into the RLS cannot always be understood by the EDG RM and vice-versa. Proposed solution: The EDG Replica Manager is changed to store LFNs and GUIDs into the RLS in the same way that POOL does (i.e. without the prefixes). However, POOL does not directly support the srm: syntax, but allows the users in several ways (either via catalog modification or on-the-fly rewriting) to achieve the translation required to use the file from ROOT. The local catalog entries of the form "rfio://" etc should be translated into the accepted form when registering on the grid into RLS. This is proposed as a short term solution, we still have to understand what the long term solution is between POOL and RLS/RM. However, we will implement these fixes now and discuss how to resolve this more appropriately in the longer term. 2) The second issue for the compatibility of POOL and the EDG Replica Manager is that recently the schema of the RLS catalogue has changed so that the entries that used to be "case sensitive" are no longer such. We have identified a solution that implies changes in the RLS server to keep entries case preserving in order not to break the current usage. This proposed fix will be tested as soon as possible - hopefully tomorrow or early next week. Storage Element --------------- The Storage Element (SE) for LCG-2 is intended to be based on the SRM grid interface. However, there are ongoing problems with the various independent implementations of SRM. The situation with the SRM today is as follows: A) Castor SRM with MSS backend: (tested at CERN, also installed at CNAF, PIC) get, put, getFileMetaData seem to work (but we haven't been able to test a "disk almost full" condition on the production CASTOR SRM). There was a problem with this condition in the previous version. The advisoryDelete method is not enabled on the production CASTOR. The latest version of the Replica Manager does make use of this method. GFAL works with this version as long as the condition "disk almost full" is correctly handled. B) Castor SRM for disk-only systems: This will not be supported or deployed C) dCache SRM with Enstore at FNAL: Basic SRM tests work. GFAL-dcache SRM interfaces seem to work. Waiting for a machine to test the Replica Manager with dCache SRM. The advisoryDelete method is not implemented, but the dCache developers are discussing with us how this can be fixed. D) dCache SRM with disk-only system: Waiting for a packaged version (RAL is doing). The advisoryDelete method needs to be available for disk-only sites to enable space management. Deleting files can be achieved through the gridftp interface but the RM cannot talk to both the SRM and gridftp interfaces on the same SE (not unexpected - a given SE should either be one or the other). We propose therefore to start now with "Classic" Storage Elements using gridFTP to access them via the Replica Manager tools. The Castor MSS at CERN is available immediately via gridftp (fixes to the error handling in Castor were required in the last few days to make this work). The replica manager also works against a "classic" disk-only SE through gridFTP. The Castor MSS installations at CNAF and PIC should also be accessible in the same way as at CERN. The FNAL MSS (Enstore/dCache) should also be available via gridFTP (and the replica manager). Once we receive the packaged version of dCache with its SRM interface (RAL is doing this packaging) we intend to start deploying it to Tier 2 sites in parallel with the classic SE. Initially this will be used through the gridftp interface rather than the SRM interface. We expect to be testing this next week without the advisoryDelete method. This implies that initially sites/experiments will still have to manage the SE disk space "by hand". We are discussing with the dCache developers how this can be implemented. The deployment and migration to the full SRM SE solution will be delayed until we have been able to do more thorough testing of all the implementations and tools that rely on them. It has become clear in the last few days that most of the implementations are far less mature than was expected and need changes to function as a full SE. GFAL will not be usable with filenames other than PFN until the SRM is deployed, and will need modifications to strip of the lfn: and guid: prefixes. In summary: For LCG-2 we begin immediately with "classic" SE's using gridftp to talk between them and the RM. We will finish the SRM testing and in parallel set up SRM SE's and migrate from the "classic" to the SRM versions. dCache will be a packaged solution offered to sites without mass storage systems in order to be able to manage their disk pools with an SRM and gridftp compliant interface to the grid. It is also clear that we will have to continually face the problems of data migration - as the RLS evolves (at least through these needed changes), and as the SE evolves. We will need to prepare tools and procedures to facilitate these migrations in a controlled way. Ian Bird IT/DI Ian Bird IT/DI Bld. 31 Room S-012 Voice: 75888 Voice: \\cernhome04\ibird Created with UregSrv on 04-09-2002 10:35:28 by gervaise@mail.cern.ch Additional Information: Version 2.1 Last Name Bird First Name Ian Label Work Bld. 31 Room S-012 Revision 20020916T115859Z