Chair: Alessandra Minutes: Matt D Attending: Dan & Terry, Elena, Gareth Roy, Ian Loader, John Bland, John Hill, Kashif, Rob Fay, Daniela, Federico, Sam Skipsey, Gareth Smith, Gang Qin, Gordon Stewart, Andy McNab, Andy Washbrook, Ewan MacMahon, Govind, Winnie, Raja, Raul Nadakumar, Liam Skinner, Oliver Smith, Brian, Catalin, Matt RB. Let me know if I missed anyone. Apologies: Jeremy, Pete G 11:01 - 11:20 Experiment problems/issues 19' Review of weekly issues by experiment/VO - LHCb No-one from LHCB at first. -Note about SL5 CEs not working for some proxies, although this has been fixed their end. Dan asks about the LHCB QM ticket regarding IPv6 problems. Raja will have a look. - CMS https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T2_UK_London_Brunel See comment last week: https://twiki.cern.ch/twiki/bin/view/CMSPublic/SpaceMonSiteAdmin Nothing to report, other then Vidyo being terrible. - ATLAS See attached email on agenda. Sites/clouds asked to give input on the evolution of atlas clouds. Alastair will attend for the UK. What are sites thoughts? Is cloud support work important for sites? Or can we live without a team, can work be moved to shifters. Elena has doubts herself. Ewan - will be slightly grumpy if anything changes - what we have works. It's very useful to have known, trusted experts. Different sort of relationship with shifters. Should be very careful about changing it. Alessandra going to meeting too. Not happy if they change things. For us it works. Allow cloud support for countries that work. ALL SITES ARE WELCOME TO JOIN THE MEETING- IT IS ASSUMED TO BE OPEN TO ALL. It would be appreciated for UK sites to join. - ADC Meeting on the 21st of July, 16.30 start (CERN time I assume). Some talk on tickets - ECDF ticket is concerning a test queue. Manchester ticket looked at. Some tickets reopened (re Squids and squid monitoring - see some talk in chat). Ewan - they changed the IPs of the monitoring boxes (again)! Elena - will talk to the frontier squid keepers. Gareth S - These tickets are a good example of how relying on shifters for support just wouldn't work. On the issue of Glasgow-RAL transfer problems, caused by packet loss on the route. FTS settings changed to help with this, ticket resolved. Lack of jobs at QM, caused by IPv6 problems (discussed elsewhere). Peter Love and Ewan on the case. - Other -- DiRAC: Jens No Jens. -- LIGO: Catalin No Catalin -- LOFAR: George No George. -- LSST: Alessandra Trying to run full analysis, only done a few dozen CPU hours so far using northgrid at a few sites. Working with Catalin to enable cvmfs - french one doesn't work, but another in US works as it's in the RPMs distributed by CERN. BNL contacted as most Big PANDA users/data are at BNL. Users using rsync at the moment - uphill battle to get users to change clients? http option available, but might not work on gfal-tools. -- LZ: David No David, but Elena reports - UK certs work for joining VO. Not all done yet - need to get registered with EGI, only after that can UK sites support it whilst it's a purely OSG VO? AF - we can support who we (the sites) like. EGI registration will make life easier, but isn't mandatory. AF - uphill battle dealing with pure OSG, but "that's how it is". Ewan - Thought we were trying not to duplicate the VOs, and only change if we hit a problem? If we haven't hit these problems yet (which might just be due to not getting to far down the rabbit hole) - should we just stick to plan A. AF - I agree. Elena - Will try to setup tools to support VO, working with Edinburgh and IC -- UKQCD: Jeremy No Jeremy. -- UCLan/GalDyn: Tom No Tom -- PRaVDA: Mark/Matt No Mark/Matt - DIRAC status -- http://www.gridpp.ac.uk/php/gridpp-dirac-sam.php?action=view Brian - working on things, like why they're showing up as 'nil' in FTS usage. Have robot cert, just getting them to use it. Working on bringing together knowledge to help get things working with the other 3 dirac sites. Meetings & updates 20' With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest - General updates There was a GDB at CERN last Wednesday agenda: minutes. http://indico.cern.ch/event/319749/ https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20150708 Very worthwhile reviewing Romain's talk from the GDB on threats. http://indico.cern.ch/event/319749/session/0/contribution/2/attachments/616106/847753/20150709_Wartel_GDB.pdf Raja noted: It looks like some certificates have problems when submitted to CEs with RHEL5 variant OS-es underneath. Does this need further investigation/work? Matt: reports that the EMI WN tarball now contains gfal CLI utilities. - WLCG ops coordination There is a fortnightly WLCG Operations Coordination meeting this Thursday, July 16th at 3:30pm CEST. There will be a presentation proposing a new Task Force studying the future of the Information System. https://indico.cern.ch/event/393611/ - Tier-1 status Gareth - Tuesday 4th August getting a vendor to look at the router - outage for a few hours (8.30-3ish). Castor will be stopped. FTS needs to be figured out. BDII will be stopped (as before). LFC will be offline or intermittant as well. Open days at RAL last week went really well. - Accounting QMUL and Sheffield appear to be lagging with publishing by a week. Please check your multicore publishing status (especially those sites mentioned in June). UK okay - Durham is a known and being tackled problem, Oxford's decommissioned there node. - Interoperation Monday 13th July There was an EGI Ops meeting today: agenda https://wiki.egi.eu/wiki/Agenda-13-07-2015 URT/UMD updates: UMD 3.13.0 released on 01.07.2015: http://repository.egi.eu/2015/07/01/release-umd-3-13-0/ APEL 1.4.1 Argus PAP v. 1.6.2 gLExec-wn - v. 1.2.3 (lcmaps and mkdir) storm 1.11.8 fetch-crl 3.0.16 cream 1.16.5 dpm-xroot 3.5.2 Xroot 4.1.1. Frontier Squid 2.7.24 CVMFS 2.1.20 GFAL2 2.8.4 GFAL2-PYTHON 1.7.1 UMD 3.13.1 released on 13.07.2015: http://repository.egi.eu/2015/07/13/release-candidate-umd-3-13-1-rc1/ (link was not updated correctly during release) ARC Nagios probes 1.8.3 SR updates (small because it's summer): gfal2 2.9.1 storm 1.11.9 srm-ifce 1.23.1.... gfal2-python 1.8.1 In Verification gfal2-plugin-xrootd 0.3.4 Accounting [John Gordon] "Of the WLCG sites we now have 97%+ of cpu reported with cores. I expect you all saw my recent email to GDB naming 16 sites. If one German and one Spanish site and the four Russians start publishing we will jump to 99%+" New list of sites needing to update multicore accounting being prepared this evening (Monday) by Vincenzo SL5 decommissioning date March 2016; Next meeting 10th August - Tickets Main event is https://ggus.eu/index.php?mode=ticket_info&ticket_id=115017 - Dan's ticket about rouge IPv6 routes Discussion 20' - Review GDB: Jeremy's notes https://indico.cern.ch/event/432838/contribution/2/attachments/1124662/1605231/July-GDB-2015.pdf Highlights - HEPIX in October, Machine/Job Features need testers. Watch out for phishing conference emails. IPv6, not much atlas effort into this so far apart from Alastair. EGI CLoud - how involved are we(the UK)? Not very we think. No GridPP/UK EGI Federated cloud sites atm that we know of. -- Agenda: http://indico.cern.ch/event/319749/ -- Minutes: https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20150708 -- Actions: https://twiki.cern.ch/twiki/bin/view/LCG/GDBActionInProgress AOB https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Test CASTOR with WN-Tar gfal2-utils - gone well. Maybe we should update the wiki. Unsure about "LSST talk" action (O-150609-01). O-150310-01 - LIGO, action might be done - objective ill-definied. O-150310-02 - requirement for GridPP members to be members of small VOs to provide decent support. No VO-binding agreement. Policy agreed on, but where to write it down? Need to figure it out. AUP wrong place, action needs to be updated - Ewan will tweak the wording. No actions recorded. Chat Log Matt Doidge: (14/07/2015 11:03) Daniela was here for a second... she's on my list. Gareth Smith: (11:03 AM) I'm having problems getting to the GridPP Wiki to read the "operations bulletin". Anyone else having problems with that? John Hill: (11:03 AM) yes Dan and Terry: (11:03 AM) yes Samuel Cadellin Skipsey: (11:03 AM) Yeah, looks like the wiki is a bit sad. Matt Doidge: (11:04 AM) It's broken in the last 10 minutes if it has. John Hill: (11:04 AM) Yes it was OK earlier this morning Andrew McNab: (11:04 AM) Just having a look now Ewan Mac Mahon: (11:05 AM) It's not just the wiki, my vac nodes were complaining about access to some user_data files earlier too, and the main www.gridpp.ac.uk front page was slow too. Gareth Smith: (11:08 AM) Thanks Andrew. Matt Doidge: (11:09 AM) I put that the meeting is open to all in all caps on the minutes. Andrew McNab: (11:11 AM) Better now? John Hill: (11:11 AM) yes thanks, Andrew Ewan Mac Mahon: (11:13 AM) We should try to get a decent amount of site representation along - our atlas cloud support works really well, and I don't think our sites would be as useful without that site<->VO link. Andrew McNab: (11:13 AM) Sorry: the webserver had filled the machine up with httpd processes Federico Melaccio: (11:13 AM) interesting it would be nice to be informed about these changes as we cannot whitelist all cern addresses just for that although they recommend to allow the full range: https://twiki.cern.ch/twiki/bin/view/Frontier/InstallSquid#Enabling_monitoring Ewan Mac Mahon: (11:16 AM) And they've been told that's not going to happen on the grounds that it's daft. They need to stop f***ing it up, that's really all there is to it. Federico Melaccio: (11:17 AM) I agree Daniela Bauer: (11:19 AM) Sorry, my Vidyo is terrible this morning. CMS has nothing to report. All the T2s are green Samuel Cadellin Skipsey: (11:20 AM) LIGOwards, I've sent Tom W an email about some data management decisions we need to talk about. Daniela Bauer: (11:21 AM) @Sam, do you think we can close the dirac storgae ticket ? Both Simon and me will be away the ext week, so it's all going to grind to a halt anyway. Samuel Cadellin Skipsey: (11:21 AM) Yes, I guess so - it seems much improved. My email to Tom was actually concerning if we should move LIGO to New DIRAC So, yeah, close with "New DIRAC is much better" Ewan Mac Mahon: (11:22 AM) Minor VO emails not going to gridpp-support make me sad ;-( Samuel Cadellin Skipsey: (11:22 AM) If it makes you feel better, Ewan, I was going to email gridpp-support once I got feedback from Tom :D Ewan Mac Mahon: (11:24 AM) A little. *sniff* Elena Korolkova: (11:26 AM) sorry it's drilling Andrew John Washbrook: (11:28 AM) I mostly got that! Ewan Mac Mahon: (11:28 AM) I think the other thing is that you need to join the VO before supporting it Elena Korolkova: (11:29 AM) Do you? Ewan Mac Mahon: (11:29 AM) That's our basline ask of the VOs - that they allow appropriate GridPP support staff to be members. If we don't have anyone in the VO it makes it really hard to debug stuff. Federico Melaccio: (11:29 AM) yes we can Matt Doidge: (11:29 AM) Which Matt? Not me Elena Korolkova: (11:30 AM) we support several vo's and I'm a member only of atlas and t2k. Ewan Mac Mahon: (11:30 AM) Right, but the big VOs have lots of support effort, the small ones lean on us a bit more. We don't necessarily all need to be in each VO, but someone in the GridPP ops team needs to be in the VO. So, let's say you're in the VO, if someone else has a problem they can email you to try sending test jobs and whatnot. If no-one can do that it just makes life difficult. Federico Melaccio: (11:31 AM) I cannot hear anything now Ewan Mac Mahon: (11:32 AM) Me neither. Raja Nandakumar: (11:32 AM) Cannot hear Alessandra again Dan and Terry: (11:35 AM) except qm is passing all the tests its only that moving to production that reviled issues Ewan Mac Mahon: (11:36 AM) Just so we're all clear, the currrent working theory on the 'QMUL problem' is that QMUL is fine, and CERN have done a silly thing. The CERN machines' config is definitely wrong, the only thing we're not 100% clear about is the exact mechanism by which that wrongness is actually then breaking things. Daniela Bauer: (11:39 AM) Does that mean cetest02 does now work (for the multicore?) Because we upgraded the CE. Ewan Mac Mahon: (11:46 AM) It sounds like Romain, it's got his "we're al doooooooomed" style. Matt Doidge: (11:47 AM) Daniela - cetest02 has started publishing core count looking at John's Links Alessandra Forti: (11:52 AM) https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items Daniela Bauer: (11:55 AM) @Matt - glad to hear it Matt Doidge: (11:57 AM) I'll action Ewan to update the action :-P Ewan Mac Mahon: (11:58 AM) I'm taking action :-) Federico Melaccio: (11:58 AM) thanks Elena Korolkova: (11:58 AM) bye Federico Melaccio: (11:58 AM) bye Alessandra Forti: (11:58 AM) bye