CCRC'08 Storage Solutions Working Group Con-call

28/R-015 (CERN)



Show room on map
Mailing list:

To join the call, do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0139097, or
  2. To have the system call you, click here.

(Leader code is 0129124).

Brief notes from CCRC08 Storage Solutions Working Group of 25 Feb 2008 Agenda with attached material at : Present: J.Shiers (chair), H.Renshall (notes), B.Koblitz, J-P.Baud, M.Branco, A.Pace, S.Campana, T.Cass Phone: P.Fuhrmann, G.Oleynik, T.Perelmutov, B.Bockelman, P.Charpentier, S. De Witt, J.Gordon, F.Donno Discussion of SRM v2.2 key methods and behaviour: J.Shiers said that so far ATLAS, LHCb and CMS have uploaded (to the agenda) their prioritisations of the list of SRM v2.2 issues though the CMS one is not yet the official position of the experiment. These three are in agreement on the top two issues though not in the same order namely: protecting spaces from misuse combined with implementations being fully voms aware handling space tokens for 'more than write' (i.e. PrepareToGet/ BringOnline/srmCopy). JS asked what was currently lacking in voms awareness. BK replied that writing into CASTOR at CERN does not take the voms group into account. SdW added that CASTOR still uses the gridmap file. JG asked if dcache was fully voms aware and JS thought this was a long way off. JS suggested we should be looking to test the two top priority extensions to SRM 2 during the next annual readiness test before the years LHC data taking - i.e. CCRC'09. GO said that, in considering priorities, they have not yet implemented 'change space for files' and was this still needed ? PC said that we cannot have protection from misuse without voms awareness. JG gave his definition of the meaning of voms awareness as being access control based on voms groups and roles. PF said that they (for dcache) are sufficiently voms aware to support ACLs on space tokens. SdW said the same for CASTOR proposing to add an srm attribute to the space tables but adding that gridftp continues to use the gridmap file. PC said we are asking for protection in the SRM interfaces which sounds to be fairly easy in CASTOR. SdW promised he would quantify the effort and give a rough timescale. For dcache PF said a bit of code was missing and estimated a month or two but that this would conflict with work on handling space tokens for 'more than write'. JS asked (again) if both priorities could be done in all mass storage systems by this time next year, reminding that for the May run of CCRC'08 we are really only considering workarounds rather than permanent solutions. PC asked if in that case can we expect all the current 'annoyances' in Flavia's SRM 2 problems lists to be fixed for the May run. FD said that two of them, selecting tape sets and the dcache 'file exists' problem are done. PC added that he meant issues like the non-return of space to a token after file deletes recently seen in dcache at IN2P3. FD said this applied to T1D0 space and PF thought this should be working both after a file delete (with a minute or two of delay) and a migration to tape and wondered if IN2P3 was downlevel in dcache. FD thought they were and in addition promised to come up with a list of remaining 'annoyances' for the next meeting. JS summarised that the planning is to fix all known problems by May leaving the misuse protection/voms awareness and tokens on recall till later adding that we will need to precise what we mean by voms awareness and this may lead to another M0U extension. PF preferred to call the misuse protection 'space token protection' while TP wanted 'voms aware control of an ACL on a space token'. PF then remarked he understood the LHCb statement on misuse protection but not that of ATLAS (both are attached to the agenda). MB reminded that he had sent around by email a clarification from ATLAS last week which HR pointed out he had attached at the end of last meetings minutes attached to this agenda. SdW reminded that changing storage class implies space token changes and asked if the need for ACLs is only on writing. PC said it was but that this included the operation of bringOnline into a T0D1 space. JS finally summarised that access control and bringOnline space tokens were the remaining priority issues. He proposed that the next CCRC'08 face to face meeting on the 4 March would go through all CCRC'08 issues and look ahead to these two remaining priority issues with further follow up and planning to be made at the April face to face meeting and later WLCG collaboration workshop.
There are minutes attached to this event. Show them.
    • 15:00 15:05
      Minutes of the meeting of 11 Feb 5m
      Brief notes from CCRC08 Storage Solutions Working Group of 11 Feb 32008 Agenda with attached material at : Present: J.Shiers (chair), H.Renshall (notes), F.Donno, J-P.Baud, M.Branco, A.Pace Phone: P.Fuhrmann, G.Oleynik, M.Ernst, D.Petravic, T.Perelmutov Addendum to WLCG SRM v2.2 MoU: JS explained his intention that this be a lightweight addendum that would detail what has to be changed, with this being based on operational experience, and that this would then be checked by the technical people. Implementation of the agreed changes would then start. GO pointed out that Mat Crawford suggests a written report on how the eventual proposals in the MoU addendum are arrived at and JS agreed with this, especially given the recent budget issues at FNAL. SRM v2.2 key concepts, methods and behaviour: FD presented her slides (attached to the agenda). The idea is to establish short, medium and long term goals with associated dates. The immediate short term goal is to select tape sets by means of tokens or directory paths. She thought dcache was in good shape for this after patch level 5 which allows to pass tokens to the migrator. For Castor she did not know the status. PF said that starting now he could not say what is short or long-term and would prefer a list of goals which could then be sorted and prioritised. GO agreed with this adding that FNAL budget problems made any other approach difficult. PF asked, on slide 4, what was the issue with srmGetSpaceMetaData. FD replied that it was important for srmGetSpaceMetaData to return correct sizes. GO then said he would like to understand the process of getting experiment input on the extensions to the key concepts, methods and behaviour and their priorities (slides 3 and 4 of the Flavia presentation). JS reminded that the experiments are on the mailing list of this series of meetings but said he was going to expose this to the MB. PF said he had been hoping that more experiment technical people would be at this and the next meeting. Since M.Branco was there it was agreed to start with comments from ATLAS which he (MB) agreed to have ready in a weeks time. JS thought we must bear in mind what can realistically delivered before May. PF said that for dcache he thought the crucial things were security and to honour space tokens given in srmPrepareToGet and srmBringOnline. He would like to have the experiment positions for the next meeting. There was then a more technical discussion triggered by PF saying that we must have a consistent interpretation of any extensions across the different mass storage systems. TM said that instead of honouring space tokens for a recall dcache could use other means to characterise the pool to be used for the recall. FD reminded that today dcache can not distinguish between generic and production users. MB said that there will be restores of files that need to go to alternative pools e.g. one time for reprocessing, another for export, and this can only be done by the restore accepting a space token. MB reminded that he had sent a short email outlining the ATLAS use case to the list of this meeting (the subject was 'tokens on "more-than-write"' and for completeness I have added the text after these minutes. HR). JS thought the only major long term issue is the behaviour of file restore from tape and PF could not predict how long changes to dcache in this area might take. MB said that for Atlas the most urgent goal is that of protecting spaces. FD said that in her list of proposed short term goals the first 2 (selecting tape sets and making the space token orthogonal to the path) are low priority or done while the second two (protecting spaces from generic users and making implementations fully voms aware) are more important to the experiments and not yet done, the voms requirement being really a part of the implementation of space protection. In summary JS said we now need replies from the experiments on the work list and their priorities and we want them in time for discussion in the week of 25 Feb by when we will have had a little more production experience. It is possible we end up with more than one MoU addendum. He will send a short email to the list of this meeting before the MB asking for experiment feedback by 22nd Feb. We will decide the date of the next meeting later. ======================================================================== email from M.Branco to the SSWG list on 11 Feb Hi, on the discussion of use cases for tokens on "more-than-write" I mentioned during the meeting... small excerpt but I believe addresses the point Timur was referring to.. cheers Our understanding of dCache LAN and WAN areas and staging Data to be re-processed is initially written to the Tier-1 using a space token. The space token name is e.g. _ATLASDATATAPE (T1D0)_. The relevant space tokens that involve staging are of course those of type T1D0. The space token itself is also appended to the physical path by ATLAS: srm:// It appears this has to be done for dCache sites - see below - but will also be done for CASTOR and DPM for the sake of consistency of naming conventions. When reprocessing starts, ATLAS will request sets of files called datasets. These datasets will be staged automatically by dCache to a default pool, which should be of type LAN. ATLAS will specify the pool the files were initially written to, but this is ignored by dCache. However CASTOR will make use of the attribute though, so ATLAS will pass it always. As there is the risk of staging files into the default pool at the same time, dCache sites themselves may make use of the path (e.g. /atlasdatatape/ ) to create different LAN pools where staged files go to_._ The output of a reprocessing job must be stored at the same site, as well as be sent to other Tier-1s. The job will use the space token _ATLASDATADISKTAPE (T1D1)_ to write the file to the correct area. As our space tokens are of type WAN, the other Tier-1s will be able to fetch the file. Future reprocessing jobs proceed as before (staging from what had been written to a WAN area back to the LAN default pool, optionally making use of the file path). We assume jobs at the site should be able to read data written to the WAN area directly. That is, a file written to a space token of type T1D0 should be copied over directly to the WN if it is still resident on the disk buffer of the tape area. While the workflow above fulfills our reprocessing requirements, it may cause problems to transfers between Tier-1s. It is always possible that a dataset is required by another Tier-1 and is currently residing on tape. Therefore, using only SRM commands, how can the dCache be configured so that, in this case, files are not staged to the default pool (which is LAN and not accessible from the outside) but instead to a WAN accessible area? Manual configurations or limiting the deployment model of our data management system (e.g. specific machines with sets of IPs making requests) has consequences in our system, as our data management infrastructure is designed to serve many sites from the same machines in parallel. _For FDR1 we survive by accepting an extra disk copy from a LAN to a WAN disk but for performance reasons this is not prefered as the final solution._ -- Miguel Branco, CERN - ATLAS Computing Group MSN: +41 22 76 71268 -
    • 15:05 15:45
      Discussion of key methods / behaviour 40m
      • ATLAS comments
      • LHCb comments
        more information
      • CMS comments
      • ALICE comments
    • 15:45 15:50
      Any new issues found during / around CCRC'08 activity? 5m
    • 15:50 15:55
      AOB 5m