Minutes for storage phone conference 29 March 2006 ================================================== Present: Edinburgh - Greig (chair + minutes) RAL - Jeremy, Jiri, Derek Glasgow - Graeme Lancaster - Brian, Matt 0. Quick review of actions See below. 1. Status of Transfer tests Graeme reported on what happened last week and plans for some tests this week. Matt Hodges will install the new version of the FTS server at RAL, probably on Wednesday afternoon. Plan is currently to run a 24 hour inbound test (to RAL) on Thursday morning. Aim is to achieve a more stable transfer rate than was observed last week: http://wiki.gridpp.ac.uk/wiki/SC4_Aggregate_Throughput#Wednesday_2006-03-22 Depending upon decision from Andrew Sansum, an outbound test from RAL to as many Tier-2's as necessary will take place in an attempt to try to and break the RAL firewall. The firewall vendor needs the logging/debug information that such a break would cause. Tier-2s for outbound test: (the fab 4) Glasgow, Birmingham, Edinburgh, Bristol (Oxford and Durham as backups). Tier-2's for inbound test: as many as required. 2. Storage metrics discussion Jeremy reported that the requirement for having some sort of accounting of used and available disk per VO per site is being pushed by the UB. Allocations already exist for the Tier-1, but not for the Tier-2s. The VOs need to know what is being provided. Also, an EGEE requirement is the provision of history metrics that will account for TB/hour/VO. Jeremy reported that the developer communities had been involved via a egee-storage-metrics list, but that activity on the list had been nil for months now. There is also the requirement from the VOs that there is a closer integration with VOMs in the SE, reflected in privileges within directories in the SE. Graeme reported that DPM supports VOMS in 1.5, but 1.5.4 PPS release of DPM was broken. 1.5.5 fixes and in CERN testing environment at the moment before going into gLite 3. VOMs support will mean that there is no longer any mapping of grid certificates to local user accounts. A list of UIDs and GIDs is internally held in DPM. This allows for the allocation of permissions on directories according to the UID. Jeremy mentioned that VOs initially requested 10 groups and 3 roles per group within VOMs. This would lead to a large number of pool accounts. For SC4 , 3 groups (or 1 group with 3 roles) will be used. Jiri did not think that there had been any real progress on VOMs integration in CASTOR. Work had started on it a couple of weeks ago but there is nothing available for testing as yet. Graeme reported that DPM developers have promised a function that will return the amount of storage in a directory, ie per VO. However, an alternative method would be to query the DPNS database, for the storage per GID, ie per VO. Graeme believed that he could get something working within a week for DPM if required at a high enough priority from the UB. Need to publish information regularly. Derek reported that RAL perform a nightly `du` on the PNFS directories of each VO, enabling them to get the storage used which they then place into the SRMs dynamic ldif file. This operation results in a heavy load on the dCache so shouldn't really be carried out any more frequently. Files over 2GB not correctly reported due to NFS v2 limits. 3. Site Reports No one from Manchester or Liverpool present so no report given. Graeme reported that he had helped fix the Durham DPM problem. Issue with wrong permissions on one of the DPM filesystems. Graeme intends setting up a DPM troubleshooting guide in the wiki. 4. Site SRM description Sites encouraged to fill out this template. Jeremy made point that sites should view this as a tool that will help the storage community to better understand their systems and allow for suggestions for improvements to be made. See actions below. 5. AOB Brian has decided not to pursue the installation of DPM on SL4 and is using SL305 instead. Too many problems experienced when going back to manual install since no yaim for SL4. Lancaster have issue with dCache PNFS database and missing files. Should report to list after further investigation. ------------------------------------------------------------------------ ACTIONS: 41 10/08/2005 Agree licence with DESY Jens Progress Jens to report back to legal people. 53 12/10/2005 Find reasoanable % for SE uptime for SC4 Jeremy Open Reassigned. Follow up with GDB et al Jens asked about sites' experience with downtime. Most sites were happy with their uptime, although QMUL had had some problems due to their poolfs. Jeremy will try to find out more at the UB. Difficult for VOs to know what is a reasonable time at the moment. 54 02/11/2005 Report on performance/scalability with pools on WNs Paul Open Progress. Will put in wiki. NB: dcache pools 57 02/11/2005 Investigate StoRM Jiri Open Jens said he'd get the report from INFN; Greig reported that StoRM will be discussed at HEPiX. 60 16/11/2005 Add client wsdl->library recipe to wiki Jens Progress 86 08/02/2006 Extend monitoring to do sites per VO and VOs per site Greig Progress 91 22/02/2006 Add an SRM 2.1 tests page to the wiki Jiri Open See also #75 This is for the high level tests - the ones the experiments asked for almost a year ago now. Jiri reported that he may be able to work on high level tests soon; otherwise Jens and Shaun had both independently said they would have a go but have both, independently, got distracted into other stuff. 95 08/03/2006 Report high IO rate pool problem to list Mona Open Closed, but problem still being investigated. 97 15/03/2006 Create template for 10 storage questions Greig/Jens Open Closed, see URL above. Sites are encouraged to describe their own setup. 98 22/03/2006 Follow up with Mona re dcache info problem Owen Open Reassigned to Greig 99 22/03/2006 Follow up with DESY re dcache->R-GMA publishing Owen/Greig Open Closed. dCache developers believe that they have already provided the gridftp logging information in the correct format for it to be published into R-GMA. They do not have anyone working on this at DESY, but would be willing to support anyone from outside who was willing to work on the R-GMA integration that would lead to the information being published. Graeme suggested looking at the DPM GridView page in the wiki that described the relevant fields that R-GMA requires. http://wiki.gridpp.ac.uk/wiki/DPM_Enabling_Gridview http://wiki.gridpp.ac.uk/wiki/DCache_and_GridView 100 22/03/2006 Follow up with dteam re using testzone for glite <-> SE testing Jens Open --------------------------------------------------------------------- NEW ACTIONS 101 29/03/2006 Find out from Steve T about the problems he has experienced in publishing dCache gridftp information into R-GMA. Derek 102 29/03/2006 Send details to list about obtaining and publishing storage used per VO within a dCache instance. Derek 103 29/03/2006 Contact Manchester and Liverpool regarding their dCache upgrade problems. Greig 104 29/03/2006 Create example of SRM site description for Edinburgh Greig 105 29/03/2006 Send email to dteam list promting the Tier-2 coordinators to encourage sites to fill out "Site SRM Setup" page in wiki. Greig 106 29/03/2006 Report to list on Lancasters "missing files in dCache PNFS" problem. Brian/Matt