GDB

Europe/Zurich
IT Auditorium (CERN)

IT Auditorium

CERN

Description
Monthly meeting of the WLCG Grid Deployment Board.

Note late start to avoid the Director's New Year Address.
Introduction (John Gordon)

A lot  more data coming this year. This means experiments need to work hard to keep resources free.
2012 meeting  - continue with second Wednesday of each month.
Forthcoming events – new is the TEGs meeting in Amsterdam.
GDB issues. Site dashboard to be discussed next time. CREAM today.
CVMFS deployment update. New stratum-0 service went live in December. There was an issue over Christmas but RAL recovered. A probe is being developed to check the freshness.
Middleware requirements are being gathered by EGI. Deadline is 15th January. Gather via NGIs or input directly into request tracker.
SLURM – an EGI survey showed a significant number of sites would be interested if the middleware supported this batch system.
7th February is a TEG day. All 6 presenting. Hope to have a high-level summary at the GDB on 8th. Also looking for site dashboard discussion and site views on 2011 (Tier-1s and Tier-2s).
Vidyo. Need to look at options. Testing today – audio an issue at the moment. Video looks okay.
For today: Information services, IPv6, middleware, RFC proxies and chroots.
 

WLCG workshop - Jamie Shiers


Deadline for early registration is the end of this month.
The purpose of the outline agenda is to give people an idea of whether they want to attend the workshop or not. Suggestion is to start with site ops overview. In the afternoon experiment tracks in series. Would be useful for them to confirm. On the Sunday something to expose the plans from Febraury based on TEGs.
John will be stepping down as GDB chair during 2012. A search committee is being setup to assess proposals. GDB members are now asked to propose candidates – these people do not have to be members of the GDB already. Proposals are required by March when a vote will take place and a handover is expected in May.
 

EMI status and plans (Lawrence Field)

JT: DSR to GSRs. It’s pushing. What then defines who allows to push into a GSR?
LF: Don’t take too much notice of the arrows on the diagram. We do need policy engines in the architecture. Details are to be finalised.
MS: You said there will be a prototype in EMI-2. Is there a policy engine in that and can it be rolled out as a pilot service?
LF: Yes as a pilot service. The policy engine would be basic. It is not expected to be a production service.
MS: With EMI-3 we expect then to replace what we have?
LF: That is up to the providers. We are already talking about GOCDB overlaps etc. It is not EMI who decide what gets deployed. This is just a proposal and whether it is adopted is up to them.
LF: The registry is only for service discovery.
JT: People working on BDII not doing this? BDII continues to be full supported?
LF: Yes this is being developed by Unicore and ARC developers.
Slide 10: Clients, APIs and libraries. What is it that WLCG wants here? This is the question from EMI.
JG: If WLCG uses ARC and gLite parts of EMI, what benefits will there be from these changes?
LF: There have been issues with information providers – ARC vs gLite. This will bring a single schema so quality issues with ARC should disappear. There should also be improved support as there will be one implementation to configure.
JG: Common configuration comes with Glue 2.0 doesn’t it?
LF: Yes. This improves interoperability – comes by default when both using same schema.
JT: Slide 7. A comment on this as an abstract project. New developments should take into account experience over last ten years and also costs. Other TEGs should also cover these things.
MS: You know that we have these extra clients that transform LDAP into something we can work with. Querying LDAP is difficult for end users.
LF: The sort of interfaces people are suggesting do not even have the option to query.  So it is not a strong argument to redo everything. People can query LDAP it just takes a bit longer to learn.
JG: LDAP is also widely used and good for that reason.
TF: Question about the usage of messaging on slide 11. It says publish dynamic information using messaging. This would mean the possibility of consuming dynamic information. Would the top-BDII consume the dynamic information?
LF: Both. Top-level cache of state data would better use messaging. But if this is in messaging system then others can consume it too. Would need to consider if this is the direction we would want to go. The devil is in the details.
Information System – WLCG Issues (Markus Schulz)
This talk is more about what has been happening in this area and what will happen.
TF: Is the glue validator ready for deployment? I understood it was for EMI-2.0.
LF: The validator is ready and already in APEL. Useable. Used within EMI internally as a test for people writing information providers for Glue 2.0. What Markus is mentioning is for operational tests.
TF: Discussed in Amsterdam that on service start the validator checks information available. It is a different idea presented here.
LF: In A discussed possibility of including this in resource info service to run at configuration time or as a wrapper around an information service. This is another approach – it was just discussed and no concrete outcome was agreed.
MS: Initially the information officer had a strong operational view.
JG: On the improvements slide: 55 sites out of 300 odd!
MS: We had no one checking the failover settings in the configuration options.
JG: Peter is working on that. What the NGIs are trying to do is have a good quality top-BDII in the region and have sites point to these as primary.
JT: The glue validator. How it was there and not being picked up. Does this have anything to do with the old validation checks? The IO wrote a series of tests…
MS: Became clear those tests overlapped with gstat tests.
JT: There were interpretations of the information that were very WLCG specific.
MS: Right now there are no tests that create tickets.
LF: Validator has concept of profiles – 1.0, 2.0, WLCG profiles etc. So this is solved in the new framework.
 
TF: A comment on support (slide 10). If a problem is raised people should be encouraged to use GGUS more rather than using the mailing list.
MS: Support that idea but LF can tell you that as a developer he gets involved in configuration issues often.
JG: In APEL the second level support has delivered little too.
TF: Perhaps we can take this offline and improve the way tickets are handled.
JG: Existing improvements being rolled out. Can you monitor the state that sites are at? With Nagios can we produce a list to name-and-shame at each meeting to encourage rollout?
MS: Not easy as not a new version but a different set of configurations.
LF: If the purpose of glite cluster is to remove duplications, then it is a very easy thing to get a list of duplicated? To get a list of caching bdiis you would observe that they have 10-20% more entries than others – query all entries. So heuristically can produce a list.
SB: It would be possible for the config to be published. Just add that flag.
LF: Possible but could require an additional intervention. Would need update and then configuration change.
JG: The client config can be seen from pakiti.
MS: Yes. You remember with glexec on WNs and CREAM the MB set clear targets. Right now we have nothing like that. Setting clear targets is needed.
JG: We have seen enough benefits from caching BDII to suggest it be more widely adopted – it is a config change in one of the later releases. gLite 3.2?
MS: Most important thing is to find a position where we want to follow up on improvements identified to ensure they are used. If we have a plan of how and who – say the MB….
JG: Would like a presentation on the merits of things like the gLite cluster. Will talk with Ian about that.
Massimo Sgaravato: 55 sites running top-level BDII is not a problem but we may wish to set a deadline on moving sites from using the CERN top-level BDII.
JG: Is this observed connections or configured to use?
 

IPv6 (Dave Kelsey)

JG: Perhaps Dave can report via me in a future meeting those who are involved in the test bed.
DK: We have quarterly meetings so once a quarter an update can be provided.
 

EMI-1 – Status (Cristina Aiftimiei)

No questions
 

CREAM (John Gordon)

Are we good to remove the LCG-CE from the availability calculation?
Have stressed for a while that sites should move to CREAM. SGE fix came out in November. No pressure to remove LCG-CEs but WLCG not interested in it and does not wish to support it so sites will only be judged on CREAM availability. Are all SGE sites now using the new version? Can we use CREAM only for the January calculation?

Action: JG will write to lists – such as GDB; rollout and Tier-2s to confirm again that sites have no opposition to the switch.

There are still some underlying concerns about the stability of CREAM but running is currently possible.
ML: And the LCG-CE because very risky – like a house of cards.
 

RFC Proxies (Maarten Litmaath)

JG: Inherent is it that you can not support two things at once
ML: Not with the software at the moment. Difficult to provide extra flexibility. Standardisation is a good thing but there is a pressure here. We do have time, and plans, to get there over this year. We can make sure those components get into the release. Can make sure it works with RFC proxies. Clients can be deployed (eg. ARGUS) as they are backward/forwards compatible. Would only be able to switch to RFC if all components support it. Would be good to track deployment progress of the components.
PF: What are the migration states? Not all dCaches need to migrate at the same time. Once everyone has moved then recommend CA’s create cerfitifcates with sha2 hash.
JT: The moment new dCache comes then anything not working with RFC will not work….
ML: dCache does support RFC. Must get rid of all old versions
JT: Migrate everything to RFC now?
ML: Yes. BY end of year can have gradually tested RFC usage without it being the default. Once RFC proxies used everywhere then we can dictate the switch.
JT: The new dCache works with sha1 and sha2?
ML: Yes.
MS: Then we have a clear date when the last LCG-CE must go off line?
ML: Yes sure. LCG-CE does not support RFC proxies and anyway it is EOL at the end of April.
DK: There is an EUGRIDPMA meeting next week that I attend for WLCG. I need to get things straight. With your list. What is the situation with ARGUS, CREAM, WMS … are we waiting for RFC…
ML: ARGUS, CREAM… support sha2. DIRAC developers say it is compatible to have it working with RFC in the spring. It is the java implementation that has problems.
DK: They may say WLCG delays move to RFC but actually it is the support in the middleware.
ML: Pointed out that this issue will hit other communities, we are being more proactive. Perhaps late from IGTF perspective but we were not told that long ago that this time bomb was there.
DK: Deploying EMI-2 will take the year anyway so WLCG is not creating the delay.
ML: By second half of this year can do selective updates.
DK: CAs need to be pushed… for people yes but what about for host certificates?
ML: Yes, that needs to be thought about. Updated CA over next year.
DK: If end up with sha1 compromised during the year what is our plan B?
ML: Very unlikely. Yes it is a hot topic but it is tougher than the MD5 problem. It is a good time anyway to move to sha2 as soon as possible. We want to have all our stuff ready way before the end of the year with some RFC services we can test progressively.
DK: And we could turn it on early if needed?
ML: Yes but it would lead to disruption.
SC: RFC proxies tested for ATLAS. Fine. But nobody tried sha2.
BB: I have tried sha2 proxies thoroughly. The nice thing about CI logon…
SC: My question was how to get a certificate with the sha2 hash.
BB: Worried about jglobus2 transition. Primary coder was a graduate student who has since moved.
JG: The alternative would be to move to another java container?
PF: In worst case would have to take jglobus2 and support it ourselves. For now I do not know any other solution. RFC proxies is the future. We are working with BestMan developers too. To keep things simple I would stay with this route.
BB: If dCahce is willing to do that then that’s fine.
ML: There are not many/any alternatives.
JG: What about standards?
Problem here comes to delegation and gftp uses globus … SRM would need to move to ssl. That would require big changes to all SRMs.
PF: If follow approach ML suggested then nothing strange on user side. Pressure is on dCache and BestMan. You do not need to worry about this… all you have to do is streamlined.
ML: Conclusion is that there will be no visible change for the majority of our community. Developers will be exposed to changes more.
JT: Will this recommendation also come out of the security TEG?
ML: It is on the list to be discussed within the TEG as a “real” problem.
JG: What about a plan B?
ML: If at some point IGTF feel they must move to sha2 then we could take the risk and stay with last sha1 release. Even if sha1 is compromised, with MD5 it took a few years but it did not require a supercomputer to compromise a proxy. For sha1 I do not see this happening for some time as there is no known exploit and researchers have a lot of compute resource to crack the keys.


Mixing SL5 and SL6 with ‘chroot’ (Brian Bockelman)

Pablo Fernandez: What is the experience in production with CMS and ATLAS?
BB: Very little. New nodes have been brought on with SL6 so CMS is now motivated locally to get this to production soon.
 
Slides for last item on Storage Accounting are online but were not presented. A more considered talk will be given at the next meeting.
Will try Vidyo again.
Meeting ended at 16:50
 

EVO chat
 
3:02:23] Claudio Grandi joined
[13:02:23] Mario David joined
[13:02:23] Pierre Girard joined
[13:02:23] luca dell'agnello joined
[13:02:23] Pablo Fernandez joined
[13:02:24] Hélène Cordier joined
[13:02:25] peter solagna joined
[13:02:26] Denise Heagerty joined
[13:02:27] Tiziana Ferrari joined
[13:02:27] Stephen Burke joined
[13:02:28] Gonzalo Merino joined
[13:02:28] CERN 31-3-004 joined
[12:54:00] Stephen Burke i can now hear everything twice 
[12:54:48] Stephen Burke don't create any video black holes 
[12:55:36] CERN 31-3-004 can you hear me?
[12:55:39] Jeremy Coles Yes we hear you
[12:55:57] Pablo Fernandez We just see the room, not the desktop
[12:56:06] Stephen Burke now muted in vidyo and can still hear
[12:56:30] Pablo Fernandez now we see the video 
[12:59:22] Jeremy Coles Did anyone else lose audio?
[13:00:24] Tiziana Ferrari yes
[13:02:29] Tony Cass joined
[13:02:29] Cristina Aiftimiei joined
[13:02:29] Ron Trompert joined
[13:02:32] Alberto Aimar joined
[13:06:51] Maciej Gorski joined
[13:08:03] MIhnea Dulea joined
[13:09:53] Alberto Aimar left
[13:11:16] Alberto Aimar joined
[13:12:10] Alessandra Forti joined
[13:12:42] Alberto Aimar left
[13:13:47] Alberto Aimar joined
[13:16:05] Massimo Sgaravatto joined
[13:16:54] Alberto Aimar left
[13:17:17] Alberto Aimar joined
[13:17:36] Massimo Sgaravatto left
[13:17:44] Massimo Sgaravatto joined
[13:17:52] Alberto Aimar left
[13:18:43] Alberto Aimar joined
[13:27:22] Cristina Aiftimiei left
[13:29:28] Cristina Aiftimiei joined
[13:41:27] Maciej Gorski left
[13:42:08] Maciej Gorski joined
[13:48:40] John Gordon joined
[13:54:23] Stephen Burke it's fine from here!
[13:54:28] Stephen Burke Must be in the room
[13:54:30] Mario David it's fine for me
[13:55:12] Stephen Burke the support in ggus is still mostly me and Laurence!
[13:57:45] Stephen Burke how many years did CREAM rollout take?!
[13:57:45] MIhnea Dulea left
[13:57:47] Jim Shank joined
[14:03:00] Mario David just a version of the bdii
[14:04:23] Stephen Burke but the caching is an option, it can be on or off in a given version
[14:09:29] Massimo Sgaravatto FYI: glite-cluster is also documented in the CREAM twiki
[14:11:42] Andrea Ceccanti joined
[14:16:06] Maciej Gorski left
[14:21:19] Jim Shank left
[14:23:22] Jim Shank joined
[14:23:22] Jim Shank left
[14:25:14] CMS Centre Bologna joined
[14:27:42] Massimo Sgaravatto left
[14:27:53] Massimo Sgaravatto joined
[14:33:54] Mario David we hear good
[14:34:04] Mario David it's only cern room apparently
[14:41:15] Cristina Aiftimiei I immagine 
[14:41:25] Cristina Aiftimiei I'm using both EVO and Vidyo... just in case
[14:41:26] Cristina Aiftimiei 
[14:42:46] peter solagna left
[14:43:49] Massimo Sgaravatto left
[14:43:55] Massimo Sgaravatto joined
[14:46:00] Massimo Sgaravatto left
[14:51:27] Massimo Sgaravatto joined
[15:00:44] Phone Bridge joined
[15:00:44] Phone Bridge left
[15:10:36] CMS Centre Bologna left
[15:15:00] Stephen Burke I'm still watching the video on vidyo!
[15:32:06] John Gordon Brian is talking via vidyo
[15:32:49] Alberto Aimar left
[15:34:22] Soon Yung Jun joined
[15:34:48] Soon Yung Jun left
[15:36:37] Andrea Ceccanti left
[15:42:25] Jos van Wezel joined
[15:43:22] Pablo Fernandez what experience you have in production?
[15:43:38] Pablo Fernandez CMS / ATLAS?
[15:43:39] Stephen Burke 
[15:44:51] Pablo Fernandez Thanks
[15:45:24] Mario David left
[15:45:39] Jim Shank left
 
There are minutes attached to this event. Show them.
    • 13:00 13:30
      Introduction
      Convener: Dr John Gordon (STFC - Science & Technology Facilities Council (GB))
      slides
      WLCG Workshop
    • 13:30 14:30
      Information Service for WLCG
      Conveners: Mr Laurence Field (CERN), Dr Markus Schulz (CERN)
    • 14:30 15:00
      IPv6
      Convener: Dr David Kelsey (STFC - Science & Technology Facilities Council (GB))
      slides
    • 15:00 15:30
      Middleware
      Convener: Doina Cristina Aiftimiei (Istituto Nazionale Fisica Nucleare (IT))
      slides
    • 15:30 16:00
      CREAM

      Is it working for SGE and can we remove LCG-CE from the availability calculation

      slides
    • 16:00 16:15
      RFC Proxies
      Convener: Maarten Litmaath (CERN)
      slides
    • 16:15 16:45
      Managing chroots
      Convener: Dr Brian Bockelman (University of Nebraska)
      slides
    • 16:45 17:00
      Storage Accounting
      Convener: Dr John Gordon (STFC - Science & Technology Facilities Council (GB))
      slides