GridPP Cloud Meeting 15th February 2013 2pm
Present: Ian Collier, Simon Fayer, David Colling (DC), Duncan Rand, Adam
Huffman, Andrew McNab, Robert Frank, Kashif Mohammad, Wahid Bhimji,
Andrew Lahiff, Jeremy Coles, Roger
1 Actions from previous meeting
2 CMS status - Andrew Lahiff
4 LHCb - Andrew McNab
5 GridPP
6 Other UK sites and interaction with them
7 Security
9 Next meeting
10 Actions
Actions from previous meeting
- The new contact address for the GridPP Cloud hosted at Imperial is:
- Each site to create a twiki page- only ECDF has done this so far
- IPv6
- DC discussed with Dave Kelsey
- Just be part of test-bed for now, nothing specific
- Discussions ongoing
- Contact with other cloud projects
DC contacted David Wallom, Matteo and Steve Newhouse
There will be a conference call today at 1610, on which DC will
report for the next meeting
Ian's resources not part of EGI Federated Cloud yet because their
OCCI endpoint is not installed yet
DC contacted Bob Jones at Helix Nebula, which is starting its second
phase now
He said there should be two ways to interact with Helix Nebula:
i As sites providing cloud resources, via the EGI Federation
ii Dialogue at country level over the provision of private and
public clouds
- DC said GridPP is interested at this level
- DC will follow up on what happens
- Ian posted links to VM information
- DC spoke to John Green regarding security and this work will be ongoing
CMS status - Andrew Lahiff
- No access to HLT farm since the last meeting
- 4000 jobs running, at which point networking problems emerged,
namely the frequency of Condor's polling of the cloud scheduler and
the gathering in of input data
- Decided to measure individual job performance/requirements and
change networking accordingly
- Upgraded to new Condor version, with important fixes, including
cleaner handling of instance shutdown
- No time to measure job performance properly yet
- Stage out plugin work not finished yet either
- LHC run has finished now, so Andrew has asked for access to the HLT
- GridPP Cloud tests
- Upgraded Condor to same version as HLT
- Much more stable now
- Using CRAB as well as glideinWMS, which allows for monitoring via
standard CMS dashboard
- Jobs running successfully now
- Some timeouts with jobs staging out to CASTOR at RAL
- Reading data in over xroot at IC
- Analysis job, using SRM to RAL
- Problem with CVMFS and jobs starting before it was ready
- Adam implementing scripted shutdown if basic CVMFS test fails at boot
- During work on this realised there was a CVMFS
misconfiguration,hopefully now fixed
- 72 jobs, 18 instances
- ACTION: Adam to create a /store/user area for Andrew at IC to
allow for faster, simpler storage access
- Duncan wants to be able to see these jobs on the dashboard
- ACTION: Andrew to distribute a link to the dashboard for these jobs
- Wahid said Peter was on holiday, so a status report should wait
until his return
LHCb - Andrew McNab
- Andrew spoke to Mario Ubeda Garcia, who is the main cloud person for LHCb
- Submitting to Hamster at CERN, pulling jobs from CERN certification
task queue
- Andrew plans to run those same images, with the same
contextualization system, at Manchester
- Andrew asked whether this group purely for clouds or VM too
- DC said mainly clouds, but open minded
- Ian asked for more details of what he wants
- Andrew said e.g. GDB group, what about sites running VMs directly?
- Ian said that group is specifically for clouds
- DC said focus of this group is the same
- Hamster? At CERN, is it part of agile infrastructure?
- DC will be talking to Matteo about agile infrastructure anyway
- Ian said hamster is mechanism to create individual instances on
CERN agile infrastructure
- DC asked whether LHCb has plans to run instances on external
sites? I.e. not just CERN
- Andy plans to try to run VMS directly at Manchester over next two weeks
- Wahid storage testing, running some jobs that open files directly his plan
- ACTION: Adam to put documentation on the wiki
- Just plans to play with it at first
- ACTION: Wahid should aim to have plan of things to test by next meeting
Other UK sites and interaction with them
- DC wants communication between UK cloud sites, spreading best practice etc.
- Wants to form a community out of this
- Ian said there has been significant work on what's needed for
private clouds already e.g. GDB e-group
- DC asked Ian to think about how to generate this community, for next
- Need to form a community which is also part of other communities
- ACTION for DC and Ian, Jeremy to be part of this too
- John Green not present, action delayed
- DC sent list of proposals for talks at next GridPP meeting
- He will refine based on feedback
- DC asked how many people subscribed to the GDB list?
- Ian thought it might be useful to look at summary of last month's
discussion from GDB
- DC will circulate a link
- Will be ongoing e.g. meeting at KIT
Next meeting
- Meet again in two weeks' time, Friday 1st March 14:00
- David to report on conference call about EGI Federated Cloud
- Adam to create /store/user area at IC for Andrew L.
- Andrew L. to distribute a link to the CMS Dashboard for the cloud jobs
- Adam to put documentation of the cloud cluster on the wiki
- Wahid to have plan for storage (S3) testing for next meeting
- David, Jeremy and Ian to think about how to form a cloud community
that joins with other similar communities
- David to circulate a link to the discussion at last month's GDB