Minutes of Storage Phone conference, 16 Aug 2006 Present: Edinburgh: Greig Glasgow: Graeme, Jamie Lancaster: Matt, Brian Durham: Mark Imperial: Olivier DESY: Owen, Patrick (briefly) RAL Tier 1: Derek RAL Storage: Jens (chair+mins) Apologies: RAL Storage: Jiri 0. Review of actions (see below) 1. SE Monitoring and Accounting revisited Greig and Graeme are working with Paul on building monitoring and alerts for, respectively, dCache and DPM into the MonAMI framework. Both have received excellent support from Paul. Suggest it is useful also for CASTOR. Some monitoring will be site specific, such as alerts when a filesystem fills up. Other things can be used as progress metrics: Greig is monitoring number of gets, puts, copies. This is useful as a sort of progress metric, to persuade management that your SE is busy doing useful stuff (assuming of course the numbers look impressive). Graeme is monitoring space stuff, space used, free space, as well as individual filesystems, including whether they are disabled, read-only, etc. This needs two actions and a wiki page. We have previously discussed useful progress metrics, and this could be a good time to review whether we can monitor them, and conversely, Greig and Graeme should describe existing work. 4. FTS and SRM, and PhEDEx too. And storage classes and failed transfers. There are some problems with the way FTS interacts with SRMs, most obviously for CASTOR, but generally FTS support for SRM2 also has issues for DPM and dCache. One problem is what to do with failed transfers, but for DPM and dCache advisoryDelete cleans up nicely - for CASTOR one needs to log into a UI and run some cleanup commands, but this should be fixed in SRM2. Another obvious problem is the space token, since each of the three implementations will be handling the space token in an implementation specific way. Many FTS-CASTOR SRM problems could - at least in theory - be resolved by deploying SRM2, but that then has implications for dCache (Jiri's testing has shown DPM's SRM2.2 to be the most mature of the lot, with the latest being uncrashable). It is worth mentioning that FTS generally is out of scope of this group, but OTOH FTS is essential for the success of LCG so the FTS interfacing to SRM is within the scope of the group (we had a vote on this in a phone conf some time ago when we discussed whether FTS was in the scope of this group). Graeme says 2.2 support in FTS is ongoing, no recent news. There is a user support mailing list which Derek subsequently circulated to the list: http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0608&L=gridpp-storage&T=0&P=5731 ------------------------------------------------------------------------ OPEN ACTIONS 41 10/08/2005 Agree licence with DESY Jens Open Patrick and Jens have agreed that Owen can keep supporting GridPP (subject to availibility and priority!). Conversely, GridPP recognises Owen's work at DESY on YAIM and dCache installation as important, and valuable to GridPP. 53 12/10/2005 Find reasoanable % for SE uptime for SC4 Jeremy Open This needs to move to a Wiki page. The CASTOR BDII was down this morning which, although it's not being monitored, prompted the question whether BDII downtime is recorded in Greig's tool. Not yet, but it's doable, but it may not be the best way, because downtime is more complex. Alternatively, downtime should be monitored locally, by some sort of daemon on the SE. Something that can keep an eye on the pool and raise an alert. 86 08/02/2006 Extend monitoring to do sites per VO and VOs per site Greig Open No news since last week, since only the visualisation is missing, and DK is still out. 105 03/05/2006 Re-poke DESY or FNAL about SRM (now 2.2) 2.1 for dCache Owen Open 1.7.0 alpha is out. GridPP should volunteer to test. Also need to run Jiri's test tool, but Jiri is out for the next week. He built a release yesterday (Tuesday 15.08.06), so anyone can build and run tests. The release won't have the guaranteed reservations and suchlike, but we are mainly interested in testing the get/put stuff. 116 31/05/2006 Progress of Durham-MAN networking discussions. Mark Open Ongoing. More news expected later this year. 119 07/06/2006 Circulate next version of VO storage to list Jens Open No news. 121 05/07/2006 Get report from NGS on GPFS Jens Open No news. Low priority until told otherwise. 126 19/07/2006 Wiki page describing dCache specific steps when storage lost Greig Open Done. http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0608&L=gridpp-storage&T=0&P=1941 It's a bash script collects all precious SURLs belonging to a specific pool. 127 19/07/2006 Test out dCache Nagios plugin Greig Open Not done yet. Depends on Edinburgh Nagios infrastructure, but site admins haven't responded yet. 129 09/08/2006 Produce GridPP response to dCache licence Jens/ALL Open Prompted by Kostas' remarks, but no news yet. 130 09/08/2006 Get legal input on dCache licence Jens Open Also no news. 131 09/08/2006 Give sites heads up regarding next SC Greig/Jamie Open Done. See GRIDPP-SC list: http://www.jiscmail.ac.uk/lists/GRIDPP-SC.html 132 09/08/2006 Circulate OSG email about recovering lost files to list Graeme Open Done. http://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind0608&L=gridpp-storage&T=0&P=5480 ------------------------------------------------------------------------ NEW ACTIONS 133 16/08/2006 Summarise MonAMI monitoring for dCache/DPM on wiki Graeme/Greig Open 134 16/08/2006 Talk to CASTOR about MonAMI monitoring Jens Open 135 16/08/2006 Circulate FTS support to list Derek Closed 16/08/2006 136 16/08/2006 Locate/summarise progress metrics in wiki Jens Open