GridPP Friday Tech meeting, 2016 01 22 -------------------------------------- Jobs in VMs =========== VAC/Vcycle ---------- New 0.20.0 release is due imminently, with multiple improvements including CloudInit compatibility and the removal of the requrement for an NFS server on factory nodes. Andrew reported that it's been in sucessful testing on Manchester's VAC resource, and that he's aiming to get the release out this afternoon. On Vcycle, work is focussing on using the EC2 layer to talk to RAL's OpenNebular cloud - the Vcycle side is believed to be working, but a recent update to OpenNebular appears to have broken (or changed?) its EC2 API. Alex Dibbo said that the details remain uncertain, and that the upstream OpenNebular devs are also unclear what's happening, but it's being looked at. Atlas VMs --------- Last year Andrew McNab & Peter Love got some CloudInit based ATLAS VMs running on a hacked test VAC with reasonable success. This work is ongoing, but should eventually allow a single ATLAS VM to run on the HLT farm, the CERN cloud, and VAC/Vcycle instances. CMS in VMs ---------- Andrew Lahiff reported the use of the CMS HLT cloud for real production work over the christmas period, and that discussions are ongoing around using it in the inter-fill periods. Dave Colling reported that it has now been agreed that this will indeed happen, but went on to note related discussions around the involvement of CERN IT in cloud provisioning (see later). LHCb in VMs ----------- Andrew McNab reported similar work on the LHCb VMs which has made them essentially generic Dirac client, with experiment details downloaded at run time based on CloudInit contextualisation. Once the UK resources have moved to CloudInit capable back ends, this should allow the use of these generic VMs for both LHCb and also for the VOs supported on the UK Dirac service. LHCb are also intending/hoping to ramp up VM hosting capacity across many of their sites, particularly with the aim of using multi-core VMs. Dave asked whether this would be a useful thing for LHCb T2Ds to get involved in, and Andrew said that it would, but that the VO had chosen not to make it a formal request so as not to over-burden the sites with hard requirements. Site updates ============= Lancaster --------- Matt reported that Robin has been working on an Ansible module for configuring VAC factories, which is currently in late testing. They were reminded of the general principle that publishing stuff is good, and Matt promised that that was the intention, but that the comments needed a degree of deswearinessing. CERN ---- Dave reported that a reorganisation of CERN IT and the forthcoming expiry of the Wigner data centre contract has caused discussions about the potential for future use of commercial clouds, with scenarios extending as far as potentially maintaining simultaneous procurements from up to ~12 providers, with users being provided with either native cloud interfaces, or potentially with CERN IT handling that layer and experiments seeing simple Condor batch slots. This raises interesting questions around dealing with geographical and network diversity, both in coping with the complexity and in dealing with the common lack of transparency of commercial backends. Dave, as GridPP Technical Co-ordinator, encouraged the experiments to get involved in the discussions. There was a discussion of the differing approaches of CMS (using the existing CERN cloud as a cloud) and LHCb (who use the LXbatch layer) and the possible differences in impact that this proposal may have on those approaches. Dave particularly highlighted the importance CMS have found of geographical and network location of squid servers - CMS have, by close monitoring of the existing systems, been able to observe the practical difficulties involved in pulling something like this off in practice. Storage updates =============== Sam reported that he was hoping to have an update from the HTTP Task Force, but that the recent meeting had fallen victim to scheduling problems. However, he did note that the planned HTTP tests were themselves now out of beta, so there could be a reasonable expectation of deplying them to test things and the results being a fair reflection of the state of the test suject rather than the test infrastructure. Sam recapped the state of the T2C testing/development work in the UK, and the effects of Oxford scaling back effort; this has been discussed in greater depth at other meetings. Sam also noted that he is preparing a future storage talk for the forthcoming WLCG meeting, and reminded everyone that he is soliciting feedback and contributions. Dave noted the existence of a kick-off meeting to set up a RCUK/PDG 'data science' working group, this will take place on the 5th of Feb; DC will report back later. Networking ========== There was no report this week, but it wa noted that a face-to-face meeting of the Hepix IPv6 working group is currently taking place at CERN. HEP Software Foundation ======================= Andrew McNab reported that there are moves to shift from having closed steering group meetings to having open meetings to aid in wider participation. Discussion - Production deployment planning =========================================== Dave opened a general discussion on how we should make progress on more production scale deployment of the technologies, such as VAC, that we have been sucessfully running at test scale for some time. Ewan started by saying that this was essentially a chicken/egg matter of policy - so far the VM systems have been seen as small scale non-critical test systems, and that holds back both deployments at sites, and effort by VOs. Dave asked how ready the VOs are for a serious move to VMs, and the general feeling was that they're all mostly there, and have plans to get the rest of the way when needed. The focus then shifted expanding GridPP's commitment, with Dave suggesting that we should find a site to move a significant fraction of its resources to VAC. Oxford was ruled out due to loss of staff, Glagow are already pursuing a different approach, and Manchester both already has a significant VAC resource, but is unsuitable as a 'generic' site since it is unique in having Andrew. It was suggested that Liverpool, who already have a decent commitment to VAC, might be a good choice. The issue of accounting and funding was raised, and Dave suggested that it would be necessary to have a suitable agreement at the PMB to protect the site/sites involved from any negative impact arising from any problems getting experiments running on the VAC resource. Andrew also noted that GridPP is currently not in an accounting period, so there's some space for sites to try stuff at the moment without any negative impact. ---------- Dave then moved the discussion to separate, but related topic of the current state of T2C testing. Most of the discussions have taken place in the Wednesday Storage Group meeting, with ATLAS specific sections in Thursday ATLAS meetings. It was agreed that Sam would try to pull together an organised summary (and possibly also the attendance of the people involved) for the next/a future Friday tech meeting.