WLCG AuthZ Call

Europe/Zurich
Description

Previous Actions:

  • We need to get the scitokens.org issuers working for ATLAS and CMS this month.

    • Doug will follow up in ATLAS.

    • Brian to be contacted for CMS.

  • HTCondor CE configuration examples, links etc. to be collected on our Twiki page.


Proposed agenda: 

  • Review of previous Actions
  • Update on the IAM service support status and plans
  • Review of AuthZ Milestones: https://docs.google.com/document/d/11fcZU8fEsfjDiSkjh95nVr4tNXLPCA_xwr2SwriBpiw/edit#heading=h.lzdl5i6720lh
  • Status of Token Issuers
  • Update on ARC token integration
  • AOB: 

Zoom meeting:

Link below, in the videoconference section. Please ensure you are signed in to Indico to see the meeting password!

Next Meeting: 

  • March 3rd
Videoconference
WLCG AuthZ Call
Zoom Meeting ID
61554826915
Description
Zoom room for WLCG AuthZ Call
Host
Tom Dack
Alternative hosts
Maarten Litmaath, Hannah Short
Useful links
Join via phone
Zoom URL

WLCG AuthZ 17/2/22

Attendees:

Adeel, Alessandra, Andrii, Brian B, Brian D, Dave D, David Cameron, David Crooks, David K, David S, Doug, Federica, Francesco, Ian, Irwin, Jeffrey, Jeny, Jim, John, Julie, Linda, Maarten, Marco, Marcelo, Mary, Max, Mischa, Paul, Petr, Raul, Roberta, Stefano, Tom (minutes), Thomas B, Thomas H

Review of Previous Actions:

  • We need to get the scitokens.org issuers working for ATLAS and CMS this month.
    • Doug will follow up in ATLAS.
    • Brian to be contacted for CMS.
  • HTCondor CE configuration examples, links etc. to be collected on our Twiki page.

 

Atlas Update, Doug: David S is connected from Atlas and can share views

David S: Discussed and agreed that the best course of action is to follow the OSG extension and extend deadline to use IAM when it’s ready

CMS Update, Brian B: No major disagreement from CMS

 

Profile changes discussion:

Brain B raised a change to be made to the token profile. Petr had informed that even when Harvester has longer lived tokens, there is an issue with the profile in the public key cache lifetime.

Currently the profile suggests a minimum lifetime of 1hr, recommended 6hrs and max 1day. This means that even if the access token lived for 4 days, if the issuer static files were down it would be unable to verify tokens.

Proposal:       

Dave: What problem are you attempting to address with increased cache lifetime?

Brian B: The cache in question is the public key cache, which is used to verify tokens. Current max is 1 day, with libraries defaulting to 4hr. This means that if CERN is offline for any time of +4 hours, even a long life token would be unable to verify.
This change would allow use of IAM with longer life tokens than switching to temporary SciTokens issuers.

Petr: In the current profile it says that tokens with lifetime >6hrs should be rejected. Whilst this is not known to be currently implemented, if it was it would break things.

Brian: Suggests an agreement in the issuers to not implement this check until after WLCG is comfortable with shorter life tokens.

Paul: Raised that there are various hacks to get things working which don’t go away – if we don’t do this right from the start, will we ever?

Doug & David S: Raised the concern that we are trying to transition during data taking.

Dave D: Raised uncertainty over the concept of “IAM being ready in a couple of months” – notably not in a High Availability mode. Questions that if it is a long way away, why not implement SciTokens for now as a start-up?

Brian: The issue here is that separate issuers are then configured across all sites in the grid. Whilst long-life tokens is one thing, asking sites to reconfigure is different – once something goes into a CE, it never comes out.
Maarten: Don’t need to keep using the other issuers – you can leave them in the config and it won’t affect the site.

Dave Cameron: From an ATLAS perspective they have built the system to work with the official issuer, the view is it is better to use an extended original issuer

Brian: As a halfway point, perhaps Harvester can have a specific signing key to generate its own tokens and survive outages

Doug: What is really the issue with using 4D for a transition period? Rather than adding on extra stuff which could be avoided.

Brian: I am happy with that

Alessandra: Longer lived tokens is the simplest solution.

Maarten: This was already seen as an option a number of weeks ago – and this is for the submission of pilots, not the bigger picture we are aiming to get to later in this year and beyond. If things are set as configurable, they can be squeezed and done different if needed.

 

Comments on IAM & Support:

Maarten: Francesco may be able to offer comments on when HA IAM is available.

From the CERN side, by May 1st we expect that we expect that the CERN team should be stronger to handle things day-by-day. Aim for every day there will be at least one expert who can be checking in.

This will not be 24/7, it will be 8/7.

Hopefully by that time Hannah will be back as the main service manager. The others will also have had extra Kubernetes training to be able to better understand underlying technology.

Looking to get another senior Kubernetes expert at CERN to “look over their shoulder”

Things have been worked on behind the scenes, but ultimately cannot go faster than this. It is possible that by May we may find the service is working nicely and that ticket response times are good, even if submitted on a Saturday.

The benefit of lifetime configurability is that it gives us a handle with which we can use to help smooth the transition.

Back to profile changes discussion:

Brian B: Proposal on the Table:

  • Make an edit to the token profile so that the max private key cache lifetime is 4d, and implement changes to libraries to use this.
  • Plan on using long-lived tokens with the current IAM issuers through to May, when improved support should be implemented.
  • This discussion can then be revisited in May

This will ultimately give us confidence that this will last a weekend outage and is a simple answer.

Tom D: call for any strong opposition to this proposal to be vocalised

No opposition was raised.

Action – Brian B: Update the profile and lead implementation of the above proposal, to be reviewed in May.

 

Further comments on above:

Brian: note to only make the public key change noted in the document as this should be a permanent change.

Paul: need to be careful as tokens pose a dangerous risk, and a max of 6hours or more is notably risky. A “Gentleman’s Agreement” is fine, but we should clearly document our intention to wind this time back down

Brian: that’s why this should be paired with Maarten’s timeline for improved support.

Note that there are 3 different ways to control tokens – expiration, audience & scope

Paul: Compared to Pandora’s box, noting that there exists a special “works everywhere” audience. “Always a community that doesn’t have the capability”

Maarten: we don’t have that problem now – ATLAS and CMS specify the intended CE(s) as the audience.

Petr: all Atlas tokens have dedicated audiences.

 

Discussion about Profile Versioning

Doug: why can’t we set a version 1 with a longer lifetime with a later version 2 tightening things down once things are more operational?

Maarten: Versions are not concurrent – Version 2 will replace Version 1

“We are leading the pack” – when we know the software etc is in good shape, others in the future would not need to go through the same kind of pain.

By keeping things configurable we can have our tightly controlled tokens and others can make their own decisions.

This particular tightening will happen for us once the IAMs are well supported – from there we can go to where we want to be – it may be 4d, 1d, 6h but once we are at ease, we are there and should not need to go back.

David S: Concerns on timescales – we are coming up to data taking, for my taste does not seem to be a need for a huge discussion, 96h seems fine for 2022. Noted concern about implementing a “Gentleman’s Agreement”

Alessandra: Doesn’t think need for worrying too much, as these discussions also happened with the proxies, were extended. Started to have chains with of proxies, that kind of stuff. Considers the use case quite limited – good way to start.

Maarten: a good point, this is a good use case to start with.

Dave K: Those comments have made me more worried – at the time with 24h proxies there were lots of renewal issues that then led to the use of multi-day proxies which have remained ever since. Hearing that this can’t be changed during the run – awful feeling it will never decrease.

David Crooks: note that the landscape very different – industry standard techniques mean industry standard attacks. Prepare for short lived, but work with longer ones.

Maarten: Indeed, by going to industry standard easier for us and for them. Need to fend off attackers sufficiently. Want to go to what we originally wanted to, just going to be a bit less fast

 

Token Subject Discussion:

Marco: A question, different discussion on the subject of tokens. Can the subject be not tightly coupled to the client ID and secret? The idea is what to do when a client ID and secret have been exposed and therefore revoked: we would not want to have CEs across the grid reconfigured with a new subject...

Brian: it can be set by IAM admins

Marco: also for changed credentials?

Brian: IAM set subject by hand – something Admin can do.

Marco: Contact?

Brian: Andreas Pfeiffer for CMS.

 

That was end of time for this week. Other agenda issues to be addressed via email if so desired, otherwise will return to them next meeting.

Next Meeting: 3rd March was planned, but we will give priority to the Rucio token meeting instead.

New Actions:

Brian B: Update the profile and lead implementation of the above proposal, to be reviewed in May.

Actions Carried forward:

HTCondor CE configuration examples, links etc. to be collected on our Twiki page.

Stefano Dal Pra already added nice tips there!

There are minutes attached to this event. Show them.
The agenda of this meeting is empty