- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
Previous Actions:
Proposed agenda:
Zoom meeting:
Link below, in the videoconference section. Please ensure you are signed in to Indico to see the meeting password!
Next Meeting:
Present: Angela, Berk, Dave D, Dave K, Dimitrios, Enrico, Federica, Francesco, Hannah, John, Julien Leduc (CERN CTA), Linda, Maarten (notes), Matt, Michael Davis (CERN CTA), Mihai (FTS), Mine, Petr, Roberta, Stephan, Steve (FTS)
Apologies: Tom
Notes:
Maarten points out that the CHEP paper will need to be looked into over the next many weeks, but proposes not to spend time on it during this meeting. Instead, we should take advantage of the presence of many data handling experts to see how far we can get with the unhandled proposal to adjust the meanings of the storage.read and storage.stage scopes.
Michael describes why he submitted the proposal: v1.0 of the profile states that storage.read is for online resources, whereas storage.stage is for nearline resources, which is not really desirable from the CTA perspective. He argues that the read scope should apply to all SEs and that it does not have to be accompanied by the stage scope for a tape SE, in which case only those files can be read that happen to be online. That said, he is OK with the stage scope being a superset of the read scope, meaning that the latter would be sufficient for a file to be staged if needed and then read. Francesco agrees with such semantics.
Next, Michael points out that a capability is needed also for querying the status of a file:
Francesco answers that the stage capability could also be used for those use cases and that StoRM has implemented that, but that it could easily be changed if needed. Steve points out that a dedicated polling scope may be desirable to help avoid putting a high load on IAM, that long lifetimes for that scope would be helpful. Hannah asks what is the use case? Steve replies that tape requests can refer to a very large number of files in one go (hundreds of thousands) and that the handling of such requests can take days: if each file has its own, short-lived token, there can at times be enormous numbers of tokens to be refreshed regularly, particularly for staging data. He adds that so far, CERN FTS instances have only seen token refresh rates between 7 and 12 Hz, and not yet the 900 Hz values that Berk reported about in the OTF meeting last week.
Note added after the meeting: those tests were against the IAM instances on Kubernetes that the experiments will switch to, not against the OpenShift instances being used in production today.
Hannah replies we should beware of making bad choices now and rather look into getting the required performance.
Mihai thinks we should not be looking into a new scope today, but rather wait for the tape REST API to move forward first. Dave D questions if a separate token per file is needed and suggests a single token per request could be sufficient. Michael points out that there is an important difference between a poll and a read scope: one may want to hand out a token that allows querying the status of a file, but not reading it. Dave D argues that FTS instances could be deemed trusted.
Maarten concludes we cannot yet decide what to do for polling, but that it looks to be OK to consider storage.stage to be a superset of storage.read. He intends to submit a new PR to adjust the profile text as needed, referring to Michael's PR that would then be closed at the same time.
Mihai points out the FTS can downscope tokens if that is deemed desirable: storage.stage could become storage.read when the requested file is online. He argues that the read scope could apply to a whole data set, whereas the stage scope may better be used per file. Julien points out we also need a scope to evict files!
Stephan argues that the stage scope should apply to the whole request, not individual files. Dave D concurs that the token should just apply to a part of the namespace. Mihai replies that a request could have a single token that would then be duplicated in the FTS DB to let it apply to all the files in the request.
Maarten asks if the stage scope could be used for all the related use cases we discussed? The name "stage" would be misleading if it also covered evictions. Michael answers that evictions do in fact happen in staging workflows. Mihai adds that it must also be possible to abort a stage request and argues that the same stage scope should indeed be usable for all related operations.
Regarding stage being a superset of read, Francesco considers those two scopes to be orthogonal, but finds the proposal acceptable. He then asks what should happen if a disk SE is presented with the stage scope: should it ignore that scope or rather fail the request? Stephan points out that some dCache instances have part of their namespace only on disk, while another part is backed by tape, and argues that the disk part can just ignore the stage scope. Francesco argues that the client may expect a certain behavior and that an inappropriate scope should not be silently ignored. He argues that something like a permission denied should be returned when for a disk file only the stage scope is present in the request. Maarten argues this might make workflows more complex and that use of the stage scope could also be taken as supplying all the capabilities that might be needed to read a given file, whether it resides on disk or tape.
Dave K points out we need to consider security aspects in these matters: what if a token with a lot of power and possibly a long lifetime is stolen? Would we be concerned about how it might be abused? Steve points out that FTS currently has complete power over the namespace of each VO served and that a separate poll scope would be more secure than reusing the stage scope. Stephan points out that just about anything we do with tokens is already better than X509! Dave D points out that an important distinction between stage and poll scopes would be the number of files to which the scope applies and that for FTS workflows it is a single machine that gets such powerful tokens. Stephan adds that a token would usually just have a single SE as the audience. Maarten concludes that we can and should take security decisions per workflow, not one-size-fits-all.
Next, Hannah asks by when the experiments might switch to the IAM instances on Kubernetes? Maarten answers he will adjust the ALICE production configuration and have ETF (SAM) prod switched as well, after which the OpenShift instance for ALICE can be stopped. Stephan answers CMS is still chasing the last sites and he does not yet know when the switch could happen, but hopefully still this year. Petr answers that he will contact Ivan who is in charge of such matters now for ATLAS, but that ETF prod will need to be switched before ATLAS production.
Steve asks Stephan about the EOSCMS upgrade planned for Tuesday: what will change for tokens? Stephan replies tokens are fine in WebDAV, but not yet in XRootD, and that CERN needs to upgrade first, so that precise recipes can then be given for other sites to follow.
Finally, Maarten points out this meeting looks to be the last of this year and that we have made good progress regarding tokens this year, with particular highlights being DC24, the VOMS(-Admin) phaseout and the use of tokens in production transfers of ATLAS and CMS. Further adventures await us next year!