Operations team & Sites

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description
- This is the weekly GridPP ops & sites meeting - The intention is to run the meeting in Vidyo: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6 -- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone for dial in numbers. -- The London (UK) service is on +442030510622 -- The meeting extension is 9308582. Apologies: Minutes:

Attendeance

Alessandra Forti
Andrew Lahiff
Andrew McNab
Daniel Peter Traynor (minutes)
Daniela Bauer
David Crooks
Elena Korolkova
Ewan Mac Mahon
Federico Melaccio
Gareth Douglas Roy
Gareth Smith
Gordon Stewart
Ian
Ian Robert Neilson
Jeremy Coles (Chair)
John Bland
John Hill
Kashif
Marcus Ebert
Matt Doidge
Paige Winslowe Lacesso
Peter Clarke
Peter Gronbech
Raja Nandakumar
Raul
Robert Fay
Robert Wolfgang Frank
Samuel Cadellin Skipsey
Steve Jones
Terry Froy
Tom Whyntie

LHCb
UK running well.
T1 storage Downtimes for glibc update.

CMS
Muticore job work starting
( Federico Melaccio: From https://twiki.cern.ch/twiki/bin/view/CMSPublic/MulticoreDeploymentToT2s :
"CMS will employ multicore pilots to allocate resources at its computing sites.
These pilots are partitionable, meaning that they can internally rearrange to
schedule multiple single-core payloads, multicore payloads, or a combination of the two")

ATLAS
Problem with Atlas Frontier services on Friday
Several tickets for storage consistency checks
Ticket 118740 5% of jobs failing at Brunel, much improved, ticket to be closed

Other VOs
https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator

SNO+: Storage discussion  (more details at tomorrows storage meeting)
How to move data from the online trigger buffer to long term storage sites?
Issues with VPN.
(Matt Doidge: Maybe we need a blog post! That would make Jens happy.)

LSST: See update 18 of https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator
(Matt Doidge: I'll double check the settings at Lancaster to see if there's anything weird.)

GridPP dirac - SAM tests running OK
Update of this page needed  https://www.gridpp.ac.uk/wiki/Cloud_%26_VM_status

Operations Bulletin
http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

gridpp.ac.uk has no AAAA record
AAAA record for GridPP web site
(Ewan Mac Mahon:  Yes, but good propaganda reasons.)
(Terry Froy: Ewan: So, I suppose HePiX IPv6 Working Group might as well disband because we are
clearly only pushing IPv6 for 'propaganda' reasons.)
(Ewan Mac Mahon:  Well, no. We need substantial services on IPv6 for technical reasons,
having the front page of the website IPv6 accessible is not a technical requirement.
It's to avoid our looking silly when we say that IPv6 is really important to us, and that's a propaganda purpose.
It's a good one because it serves the substantial agenda of IPv6 enablement, but we have no specific need other
than the PR for the website to be IPv6-able.)


WLCG MB has decided to stop further deployment of glexec:
What about GridPP Dirac where it is used?
(Alessandra Forti:  I have the impression that if wlcg goes glexec goes dirac is not using glexec atm)
(Matt Doidge:  http://indico.cern.ch/event/459359/contribution/7/attachments/1229167/1800972/MB-LisbonFollowUp-160216.pdf)

Are Steve tests still usefull?:
Useful for metrics not so useful for tests.
(Matt Doidge: I used to lvoe this old page when it worked: http://pprc.qmul.ac.uk/~lloyd/gridpp/hammercloud.html
A rejig of the quarterly report would be appreciated.)

Multicore Deployment: Note Setting of memory limits and passing of parameters to batch system
http://indico.cern.ch/event/466818/contribution/0/3/attachments/1230922/1804324/20160218_MCTF_APCY.pdf

Tier-2 Evolution: Issues with the Atlas VM image (high rate of failure with the new image)

Documentation: new GridPP dirac documentation
Dirac job submission discussion: Ganaga vs dirac
(Ewan Mac Mahon:I think it's slightly tricky - in principle using ganga should be the right answer. Whether it /is/ or not is a slightly different question. I think it probably is though.)
(Alessandra Forti: me too is lhcb)
(Raja Nandakumar: LHCb uses the DIRAC Transformation system for everything The LHCb specific stuff is mainly wrappers around this)
(Alessandra Forti: joe could run jobs in a week with ganga and direct job submission to move to dirac was changing one line )
(Ewan Mac Mahon:i think that aside from anything else here there's a likely action item for someone familiar with the LHCb version of this to find out how much of their expertise carries across to the Imperial Dirac and GridPP VO.)
(Raja Nandakumar: For <~3-4K jobs ganga does a decent job of resubmission also!)
(Marcus Ebert: Is the DIRAC Transformation system that LHCb uses also available fro gridpp Dirac?)
(Raja Nandakumar: From my experience that is ...)
(Tom Whyntie: @Ewan: Agreed)
(Raja Nandakumar: Yes - the DIRAC transformation system is a part of DIRAC Not LHCb -specific that is.)
(Tom Whyntie: @Peter C: I think once you're talking 10000s of jobs, we're way beyond the UserGuide)
(Alessandra Forti: ganga + dirac)
(Ewan Mac Mahon: And I'm pretty sure Ganga is fine for thousands of jobs. Hundreds of thousands I'd want to ask Matt+Mark.)
(Ewan Mac Mahon: But I wouldn't be surprised by their answer either way.)

Tickets:

Note Atlas consistency checks
(Matt Doidge: https://ggus.eu/?mode=ticket_info&ticket_id=118930)

(Samuel Cadellin Skipsey: Re; Glasgow and HTTP - the quick update is that I'm tinkering in the background. Should be closable soon thanks to stuff I did during this meeting.)

Discussion:
Note Outcomes from Lisbon https://indico.cern.ch/event/459359/contribution/7/attachments/1229167/1800972/MB-LisbonFollowUp-160216.pdf

(Ewan Mac Mahon: The computing for the high-luminosity upgrade will be fine provided someone blows up the instrument for a year again. That really saved our bacon the last time.)


Peter Gronbech: (23/02/2016 11:24)

yes we Oxford also dropped out, both Kashif and I

John Hill: (11:24 AM)

Also in Cambridge

Tom Whyntie: (11:26 AM)

@Ewan: Yep!

Alessandra Forti: (11:26 AM)

tb-support is for admins

Peter Gronbech: (11:26 AM)

yes I agree

Ewan Mac Mahon: (11:26 AM)

Good, that's agreed then :-)

Tom Whyntie: (11:27 AM)

TB-SUPPORT for wiki pages, GRIDPP-SUPPORT for UserGuide issues

Jeremy Coles: (11:29 AM)

https://www.gridpp.ac.uk/wiki/Cloud_%26_VM_status

Ewan Mac Mahon: (11:34 AM)

Yes, but good propaganda reasons.

Alessandra Forti: (11:35 AM)

I have the impression that if wlcg goes glexec goes

Terry Froy: (11:36 AM)

Ewan: So, I suppose HePiX IPv6 Working Group might as well disband because we are clearly only pushing IPv6 for 'propaganda' reasons.

Ewan Mac Mahon: (11:37 AM)

Well, no. We need substantial services on IPv6 for technical reasons, having the front page of the website IPv6 accessible is not a technical requirement. It's to avoid our looking silly when we say that IPv6 is really important to us, and that's a propaganda purpose.

It's a good one because it serves the substantial agenda of IPv6 enablement, but we have no specific need other than the PR for the website to be IPv6-able.

Alessandra Forti: (11:38 AM)

dirac is not using glexec atm

Matt Doidge: (11:41 AM)

http://indico.cern.ch/event/459359/contribution/7/attachments/1229167/1800972/MB-LisbonFollowUp-160216.pdf

I used to lvoe this old page when it worked:

http://pprc.qmul.ac.uk/~lloyd/gridpp/hammercloud.html

I rejig of the quarterly report would be appreciated.

*a rejig

Gareth Smith: (11:53 AM)

Sorry - I need to leave now. Tier1 report as in bulletin.

Tom Whyntie: (12:04 PM)

@Jeremy: np

Ewan Mac Mahon: (12:06 PM)

I think it's slightly tricky - in principle using ganga should be the right answer.

Whether it /is/ or not is a slightly different question.

I think it probably is though.

Alessandra Forti: (12:08 PM)

me too

is lhcb

Raja Nandakumar: (12:08 PM)

LHCb uses the DIRAC Transformation system for everything

The LHCb specific stuff is mainly wrappers around this

Other than the production portal that is ...

Alessandra Forti: (12:10 PM)

joe could run jobs in a week with ganga and direct job submission to move to dirac was chaging one line

Ewan Mac Mahon: (12:10 PM)

i think that aside from anything else here there's a likely action item for someone familiar with the LHCb version of this to find out how much of their expertise carries across to the Imperial Dirac and GridPP VO.

Raja Nandakumar: (12:11 PM)

For <~3-4K jobs ganga does a decent job of resubmission also!

Marcus Ebert: (12:11 PM)

Is the DIRAC Transformation system that LHCb uses also available fro gridpp Dirac?

Raja Nandakumar: (12:11 PM)

From my experience that is ...

Tom Whyntie: (12:11 PM)

@Ewan: Agreed

Raja Nandakumar: (12:11 PM)

Yes - the DIRAC transformation system is a part of DIRAC

Not LHCb -specific that is.

Paige Winslowe Lacesso: (12:12 PM)

Sorry, have to go

Tom Whyntie: (12:12 PM)

@Peter C: I think once you're talking 10000s of jobs, we're way beyond the UserGuide

Alessandra Forti: (12:12 PM)

ganga + dirac

Ewan Mac Mahon: (12:13 PM)

And I'm pretty sure Ganga is fine for thousands of jobs.

Hundreds of thousands I'd want to ask Matt+Mark.

Jeremy Coles: (12:14 PM)

Alessandra F; Andrew L; Andrew M; Dan T; Daniela B; David C; Duncan R; Elena K; Ewan M; Federico M; Gareth R; Gordon S; Ian; Ian N; Jeremy C; John B; John H; Kashif M; Marcus E; Matt D; Pete C; Pete G; Raja N; Robert F; Robert F; Sam S; Steve J; Terry F; Tom W; Gareth S.

Ewan Mac Mahon: (12:14 PM)

But I wouldn't be surprised by their answer either way.

Matt Doidge: (12:18 PM)

https://ggus.eu/?mode=ticket_info&ticket_id=118930

Samuel Cadellin Skipsey: (12:19 PM)

Re; Glasgow and HTTP - the quick update is that I'm tinkering in the background. Should be closable soon thanks to stuff I did during this meeting.

Jeremy Coles: (12:20 PM)

https://indico.cern.ch/event/459359/contribution/7/attachments/1229167/1800972/MB-LisbonFollowUp-160216.pdf

Ewan Mac Mahon: (12:22 PM)

The computing for the high-luminosity upgrade will be fine provided someone blows up the instrument for a year again. That really saved our bacon the last time.

Alessandra Forti: (12:27 PM)

ok bye

Terry Froy: (12:27 PM)

bye

Tom Whyntie: (12:27 PM)

Thanks, bye

David Crooks: (12:27 PM)

cheers, bye

 

There are minutes attached to this event. Show them.
    • 11:00 11:01
      Ops meeting minutes 1m
      * This is a reminder that this is an important task. The minute taker gives access to the discussions for those not present and provides a reference for others to refer back to afterwards. * The team composition has been changing. If everybody contributes then the task comes around less often. * From the start of GridPP4+ those in fully funded GridPP positions will be expected to contribute. Others are welcome to volunteer! * The minutes should contain a list of who attended; apologies; note who took the minutes and highlight actions. * A count is maintained at https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items. * After uploading minutes to the agenda page the minute taker is expected to: ** Update the list of ops actions. ** Update their 'count' so the task can be shared fairly. Thank you for your support!
    • 11:01 11:20
      Experiment problems/issues 19m
      Review of weekly issues by experiment/VO - LHCb - CMS https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T2_UK_London_Brunel - ATLAS - Other: Updates should be recorded in https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator. - GridPP DIRAC status [Andrew McNab] -- https://www.gridpp.ac.uk/gridpp-dirac-sam - Status of pilot enabling across sites.
    • 11:20 11:40
      Meetings & updates 20m
      With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest - General updates - WLCG ops coordination - Tier-1 status - Storage and data management - Tier-2 Evolution - Accounting - Documentation - Interoperation - Monitoring - On-duty - Rollout - Security - Services - Tickets - Tools - VOs - Site updates
    • 11:40 12:00
      Site roundtable 20m
      ... or Lisbon follow-up: https://indico.cern.ch/event/459359/contribution/7/attachments/1229167/1800972/MB-LisbonFollowUp-160216.pdf
    • 12:20 12:25
      Actions & AOB 5m
      * https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items