Speaker:
Christophe Haen
(CERN)
Notes / discussion
Brian
- Token effort started in 2016 with proposal in 2017
- it took 2 years to agree on protocol
- additional 3 years to develop and deploy in production
- 3-4 years if we decide to move to pre-signed URLs
- Maarten:
- a lot of work has already been done to implement support for tokens
- in dCache, EOS, StoRM, XRootD, FTS, Rucio
- a similar amount of work would be needed for pre-signed URLs
- and there is not enough time between now and DC24
- it therefore seems best to do DC24 with tokens and see how that goes
- pre-signed URLs may get into the picture later
- Pre-signed URLs can come alongside of tokens in the future
- it would be useful to have a storage that supports pre-signed URLs to be able to test this concept
- there are a lot of open questions with S3 that are not solved today
- not everything is solved for tokens either
Andreas (EOS/ALICE)
- ALICE is using pre-signed URLs since ~ 2005 / in production for 20 years
- Our storages are not Amazon
- Amazon use symmetric keys
- symmetric keys can't be used securely on 100+ storage sites
- ALICE is using public/private key to overcome this issues
- cloud providers don't do third-party-copy and symmetric keys are less issue
- Chris - may be there is also asymmetrically key option for Amazon / S3 API (to be checked)
- If you leak symmetric key - it's disaster
- Rotating symmetric keys would be a mess / insane
- currently same proxies are sent everywhere, but they have lifetime
- thinking about symmetric keys for our infrastructure is generally insane (using Amazon model is bad)
- Performance concerns with IAM as a token issuer
- Dirac / Rucio could issue their own JWT tokens
- create access tokens for storage access that will be trusted
- ALICE already use this approach
- Brian: this is what OSG does - give service (e.g. Dirac) IAM private key
- Chris: this can't work for multi-VO token issuers, e.g. EGI CheckIn
Julien (TAPE)
- HTTP REST (for TAPE) standard designed for tokens, changes would require additional effort
- signed url for REST API call - it would be necessary to use e.g. SE-tokens
- Maarten: Dirac could have key to create signed URL for tape
Mihai (FTS)
- FTS must support also non-WLCG communities
- not everybody use Rucio / Dirac that could be in charge of generating pre-signed URLs
- FTS already support storing shared keys for cloud storages
- not optimal for shared multi-VO instances
- Using just pre-signed URLs not possible
- This would not make FTS devs live easier
Martin (Rucio)
- Even with locally pre-signed URLs we still have to consider overloading Rucio/Dirac with signing
- ATLAS already faced CPU performance issues with Rucio caused by transfers with cloud storages that rely on pre-signed URLs
- sub-optimal code used for listing file replicas
- still needs to be considered also for normal transfers in case all transfers needs to be signed
- per transfer token / pre-signed URL granularity may be heavy on CPU usage
- ATLAS already store keys in FTS for transfers with cloud storages
- Rucio also have non-WLCG communities - necessary to support OIDC tokens
- can't focus just on pre-signed URLs - this as in case of FTS doesn't simplify Rucio devs lives
Stephan (CMS)
- CMS already started to deploy WLCG JWT token configuration on their storage sites
- Let's focus on error handling than supporting new features
- much higher value than exploring / new research with pre-signed URL
Michael Davis
- what about token for compute
- Chris: tokens for compute are fine
Petr
- It seems to me that e.g. also Globus rely on (access) tokens instead of pre-signed URLs
IAM performance questions / concerns
- CMS expect to generate ~ 200 tokens per hour for transfers (TPC?)
- max few thousands of tokens per hour for ATLAS with reasonable access token granularity
- LHCb considered only per-transfer tokens
- this can easily reach IAM performance limits
- no official numbers from IAM devs / ops team
- 100Hz access token rate
- maximum that can be currently reached with CERN IAM instances
- lead to CERN IAM Halloween incident after 2 days (slide 7)
- access tokens could be generated also by Dirac/Rucio
- IAM can advertise multiple keys with well-known jwks_uri
- this may be tricky with multi-VO services & issuers like EGI CheckIn
- may be useful e.g. for Rucio download / upload use-case
- concerns about traceability when multiple services can generate valid token
- can't be used for more complex workflows (token exchange, refresh tokens), because it would be necessary to re-implement OAuth2
- IAM keeps track of access tokens / store details in DB
- introspection most probably doesn't work for access token is issued by different party
- IAM built on top of OpenID Connect reference implementation MitreID
Summary (by P.V.)
- Chris - come with use-cases that are not well covered (understood) by WLCG JWT tokens (e.g. WLCG-AuthZ-WG/#22)
- At least ATLAS & CMS will use WLCG JWT tokens for DC24 and we'll push service providers in this direction
- if LHCb / DIRAC cannot make tokens work for data management by that time, that is not a problem and LHCb can participate using X509 / VOMS
- it would be useful to have in-person meeting / hackathon to move forward with DC24
- ~ one in summer at CERN and one at the end of calendar year
- to be discussed in next WLCG DOMA General (May 31, 16:00)
- It is up to LHCb if they choose to investigate different approaches, but that's not a short term project (this should not be at the expense of our WLCG JWT progress)
- IAM performance concerns can be overcome by issuing simple access tokens directly by Dirac or Rucio (shared IAM key)
- P.V.: I would like to see more activity from IAM dev and ops team in addressing performance issues