WLCG AuthZ Call

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Description

Notes:

Previous Actions:

  • Action: Maarten to tidy up and review open issues and pull requests for the token profile, and then circulate a potential 2.0 draft
    • Has made very good progress!
  • Action: Maarten to look at reviving the RTE Task Force


Proposed agenda:

  • Next Profile Version
  • Token Accounting Cont - as needed

 

Zoom meeting:

Link below, in the videoconference section. Please ensure you are signed in to Indico to see the meeting password!

Next Meeting: 

  • Sept 25
Zoom Meeting ID
61554826915
Description
Zoom room for WLCG AuthZ Call
Host
Tom Dack
Alternative hosts
Hannah Short, Maarten Litmaath
Useful links
Join via phone
Zoom URL

Present: Adrian (APEL), Dave D, Dave K, Enrico, Federica, John, Linda, Maarten (notes), Matt, Mischa, Stephan

Apologies: Tom, others (CERN holiday)

Notes:

Dave D introduces the Google doc in which we try to define a standard for JWKS caches to be taken advantage of by our MW stacks, for various reasons that are provided via motivating examples. There also in an issue open in the tracker for the WLCG Common JWT Profiles. Both places can be used for comments and discussion. Dave takes us through several comments. Mischa suggests it would be good to let the tool record when a particular download happened and what URL was used. Dave replies that a given tool to manage the cache can decide to record such details, but that readers of the cache should not depend on them. Mischa replies the logging would be for traceability and debugging. For example, while the URL could be derived, what if there is some change? Dave replies changes would be rare and the tool will anyway keep updating the cache frequently. As the proposed standard text already stipulates that anything that is not understood must be ignored, we could always add things later. Maarten adds we expect to see a first version become available this autumn, allowing us to get some experience and decide whether it is good enough or that some additional feature is needed already. Dave adds the document is about agreement between the cache writers and readers.

The next comment is about the formatting of issuer URLs: might canonicalization be required? After some discussion it is decided we should not try to address any concerns regarding URL formatting in this document. Instead, a statement is added: Assume that the issuer URL has already been normalized. The next item concerns the choice of the hash algorithm and whether its results really need to be truncated to keep file names nice and short. Dave will follow up with co-authors Brian B. and Derek Weitzel. Next, Mischa suggests that unexpected (e.g. corrupted) contents should lead to suitable error messages and/or return codes. Comment updated accordingly. Next, Stephan asks what decides the cache lifetime and update frequency? Dave replies we have our token profile for guidance. Stephan points out we need to be concerned with the introduction of new keys: if they are used too soon, some caches will not have them yet! Maarten agrees we need to say something about that in our profile document. He will create a PR and announce it for comments. Stephan suggests each key would ideally come with a lifetime and that keys would be rotated quite often. Maarten replies that the former would require changes upstream, while our profile already says that keys should be rotated fairly often, but that we first need operational experience before making stronger statements.

Next, Maarten describes that he will present the profile to the WLCG Management Board on Sep 16 essentially with its current contents, i.e. referring to version "x.y" rather than 1.1, to allow (small) changes still to be made for the actual v1.1, expected to be published on Sep 24. He has increased the font size of the PDF somewhat, to make the text a slightly more pleasant read compared to the previous formatting, available as "profile-x-small.pdf" in this preview area. There are no objections.

Next, Dave D brings up for discussion that CMS is using access tokens with lifetimes of 4.5 days and broad scopes in jobs. Stephan replies that those 4.5 days allow for IAM outages even lasting a long weekend to be covered for production workflows, adding that users get much shorter-lived tokens and narrow scopes. Maarten asks if those long-lived tokens have the modify scope? Stephan answers it is needed to clean up after failed uploads. Maarten asks if the plan is to make those broad scopes narrower? Stephan answers that production tokens are foreseen to be shrunk first to an area, then to a data set. Since it will require developments in the workflow management system, with the current system being in maintenance mode, the timeframe is estimated to be 2 years. Maarten replies the alternative to using tokens in such ways would be the continuing use of VOMS proxies, and that it will anyway take a few more years before we hopefully will observe that all is fine with tokens, though things are expected to continue getting better every year. Regarding VOMS proxies, Dave points out that we have CRLs to revoke those. Maarten replies that, to his knowledge, CRLs have never saved the day in our grid ecosystem and have caused us many more problems than benefits.

Dave and Mischa point to the danger of bad practice becoming a slippery slope and argue that a token subject ban / suspension list may be needed to have some means to deal with compromised tokens. That list could also be served via the JWKS cache machinery. In practice it may imply the whole VO getting blocked, though. Maarten replies the point is taken, but that experiment computing coordinators want their experiment to do well also in terms of security and that we are not stuck with any decisions taken today. We also need to be realistic in our expectations about the availability of the IAM services: we have quite good experience with them so far and we plan to make their deployment better still, but we will be unable to guarantee that a critical problem will quickly be fixed, particularly during a weekend. Chances are the problem would not be with the IAM code, but rather with an underlying service. Configuring a backup IAM service elsewhere would be expensive and lead to new failure modes, and who would even agree to run it? Stephan adds CMS wants to go to 6h tokens, but not risk wasting any production resources today. He suggests a compromised token could be dealt with by rotating its key! He also adds that for the time being, jobs will continue to be equipped also with VOMS proxies and that those allow much bigger abuse.

Enrico reminds us of the IAM introspection endpoint being available for checking if a given token has been revoked. Maarten replies that introspection would only be conceivable for low-rate tokens, as we have generally ruled it out because of unpredictable instantaneous load it would cause on the IAM services. Stephan suggests a new token attribute might signal to a service that it must do the introspection for a given token. He adds that it would have been better for the infrastructure to have been sorted out before any use of tokens, but Maarten replies that nobody could give us a blueprint for dealing with our peculiar use cases: we will keep steering things into a better direction as we gain ever more experience.

The next meeting is planned for Sep 25. Maarten will be unable to make it because of another engagement, but will use the mailing list in particular for news about v1.1 of the profile.

There are minutes attached to this event. Show them.
The agenda of this meeting is empty