GridPP Friday Tech meeting, 2016 01 22
--------------------------------------

Jobs in VMs
===========

VAC/Vcycle
----------

New 0.20.0 release is due imminently, with multiple improvements including
CloudInit compatibility and the removal of the requrement for an NFS
server on factory nodes. Andrew reported that it's been in sucessful
testing on Manchester's VAC resource, and that he's aiming to get the
release out this afternoon.

On Vcycle, work is focussing on using the EC2 layer to talk to RAL's
OpenNebular cloud - the Vcycle side is believed to be working, but a
recent update to OpenNebular appears to have broken (or changed?) its EC2
API. Alex Dibbo said that the details remain uncertain, and that the
upstream OpenNebular devs are also unclear what's happening, but it's
being looked at.

Atlas VMs
---------

Last year Andrew McNab & Peter Love got some CloudInit based ATLAS VMs
running on a hacked test VAC with reasonable success. This work is
ongoing, but should eventually allow a single ATLAS VM to run on the HLT
farm, the CERN cloud, and VAC/Vcycle instances.

CMS in VMs
----------

Andrew Lahiff reported the use of the CMS HLT cloud for real production
work over the christmas period, and that discussions are ongoing around
using it in the inter-fill periods. Dave Colling reported that it has now
been agreed that this will indeed happen, but went on to note related
discussions around the involvement of CERN IT in cloud provisioning (see
later).

LHCb in VMs
-----------

Andrew McNab reported similar work on the LHCb VMs which has made them
essentially generic Dirac client, with experiment details downloaded at
run time based on CloudInit contextualisation. Once the UK resources have
moved to CloudInit capable back ends, this should allow the use of these
generic VMs for both LHCb and also for the VOs supported on the UK Dirac
service.

LHCb are also intending/hoping to ramp up VM hosting capacity across many
of their sites, particularly with the aim of using multi-core VMs. Dave
asked whether this would be a useful thing for LHCb T2Ds to get involved
in, and Andrew said that it would, but that the VO had chosen not to make
it a formal request so as not to over-burden the sites with hard
requirements.

Site updates
=============

Lancaster
---------

Matt reported that Robin has been working on an Ansible module for
configuring VAC factories, which is currently in late testing. They were
reminded of the general principle that publishing stuff is good, and Matt
promised that that was the intention, but that the comments needed a
degree of deswearinessing.

CERN
----

Dave reported that a reorganisation of CERN IT and the forthcoming expiry
of the Wigner data centre contract has caused discussions about the
potential for future use of commercial clouds, with scenarios extending as
far as potentially maintaining simultaneous procurements from up to ~12
providers, with users being provided with either native cloud interfaces,
or potentially with CERN IT handling that layer and experiments seeing
simple Condor batch slots. This raises interesting questions around
dealing with geographical and network diversity, both in coping with the
complexity and in dealing with the common lack of transparency of
commercial backends. Dave, as GridPP Technical Co-ordinator, encouraged
the experiments to get involved in the discussions.

There was a discussion of the differing approaches of CMS (using the
existing CERN cloud as a cloud) and LHCb (who use the LXbatch layer) and
the possible differences in impact that this proposal may have on those
approaches. Dave particularly highlighted the importance CMS have found of
geographical and network location of squid servers - CMS have, by close
monitoring of the existing systems, been able to observe the practical
difficulties involved in pulling something like this off in practice.

Storage updates
===============

Sam reported that he was hoping to have an update from the HTTP Task
Force, but that the recent meeting had fallen victim to scheduling
problems. However, he did note that the planned HTTP tests were themselves
now out of beta, so there could be a reasonable expectation of deplying
them to test things and the results being a fair reflection of the state
of the test suject rather than the test infrastructure.

Sam recapped the state of the T2C testing/development work in the UK, and
the effects of Oxford scaling back effort; this has been discussed in
greater depth at other meetings.

Sam also noted that he is preparing a future storage talk for the
forthcoming WLCG meeting, and reminded everyone that he is soliciting
feedback and contributions.

Dave noted the existence of a kick-off meeting to set up a RCUK/PDG 'data
science' working group, this will take place on the 5th of Feb; DC will
report back later.

Networking 
==========

There was no report this week, but it wa noted that a face-to-face meeting
of the Hepix IPv6 working group is currently taking place at CERN.

HEP Software Foundation
=======================

Andrew McNab reported that there are moves to shift from having closed
steering group meetings to having open meetings to aid in wider
participation.


Discussion - Production deployment planning
===========================================

Dave opened a general discussion on how we should make progress on more
production scale deployment of the technologies, such as VAC, that we have
been sucessfully running at test scale for some time. Ewan started by
saying that this was essentially a chicken/egg matter of policy - so far
the VM systems have been seen as small scale non-critical test systems,
and that holds back both deployments at sites, and effort by VOs. 

Dave asked how ready the VOs are for a serious move to VMs, and the
general feeling was that they're all mostly there, and have plans to get
the rest of the way when needed.

The focus then shifted expanding GridPP's commitment, with Dave suggesting
that we should find a site to move a significant fraction of its
resources to VAC. Oxford was ruled out due to loss of staff, Glagow are
already pursuing a different approach, and Manchester both already has a
significant VAC resource, but is unsuitable as a 'generic' site since it
is unique in having Andrew. It was suggested that Liverpool, who already
have a decent commitment to VAC, might be a good choice. 

The issue of accounting and funding was raised, and Dave suggested that it
would be necessary to have a suitable agreement at the PMB to protect the
site/sites involved from any negative impact arising from any problems
getting experiments running on the VAC resource. Andrew also noted that
GridPP is currently not in an accounting period, so there's some space for
sites to try stuff at the moment without any negative impact.

----------

Dave then moved the discussion to separate, but related topic of the
current state of T2C testing. Most of the discussions have taken place in
the Wednesday Storage Group meeting, with ATLAS specific sections in
Thursday ATLAS meetings. It was agreed that Sam would try to pull together
an organised summary (and possibly also the attendance of the people
involved) for the next/a future Friday tech meeting.