Token Trust & Traceability WG

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Description

Fortnightly for the risk assessment season.

Zoom Meeting ID
64974356171
Host
Matthew Steven Doidge
Useful links
Join via phone
Zoom URL

https://codimd.web.cern.ch/eeIohGpET4aDoU_nFOwU5g?view

 

# TTT 28th October 2025

Attending: Maarten, Donald, TomD, Linda, DaveD, Luna
Minutes: Matt

Apologies: Mischa, Marcus, DaveK

## From last time
We applied the assessment process to all the "generic" flows. The plan for today was take on the two threats that have "per workflow" rows, but first consider the generic version of that threat, and see if this applies to any of the specific workflows.

## Continueing the assessment

ML notes that this isn't the final assessment - the results will be versioned.

TR-4 almost "by design" going to be workflow specific, due to the lack of one-size-fits-all in the grid, and the differences we have in the grid.

Now know not all tokens will be short lived.

Currently have 6 workflows, this might not be all, but none noticed to be missing yet.

Again note updates in the future.

4a - how atlas and lhcb work. Narrowly scoped, long lived tokens.

Disparity in Likelihood - most likely exposure from misconfig, mentioned the previous steps to hide tokens in logs which could be reverted. But large amounts of scrutiny on these services.  
Settle on 2

Impact - Some range, averaging 3. Discussion of future mitigations. Impact increased by the fact that any leak will most likely involve multiple tokens.

4b - CMS style FTS, exchanged tokens, refreshed. 

In general ignoring handshakes between IAM, FTS etc.

Likelihood - taking that into account we settle on a 2 - thinking the two "FTS" workflows have the same likelihood of going wrong.

Impact - Some go high, concerns of wide scope and exchange.

Need to be a privileged client to be able to refresh.
But if it could be intercepted. And note that we could delete with the current model. So if storage element broken into attacker could do damage.

Luna - Time doesn't feel like as much of a mitication, need other factors to reduce impact.

ML - eventually tokens don't have modify scope.  
Not there yet, so it will be one to watch.

So 4b is riskier then 4a, which makes sense

4c - grid jobs, long lived tokens. Someone broken into WN could "harverst" tokens, could then steal payload and credentials.  
Implimented for some VOs, potential damage "contained".

Discussion of alice workflows with job tokens. For alice extra credentials are extremely narrow.  
Luna - once a job is running can it get back to the token that spawned it?  
ML - probably not, due to running in another container.  
Luna - with this token could you drain a queue?  
ML - will follow up with ALICE to see if there's a limit, ATLAS have a limit. But without it you could indeed drain a whole queue  

Likilihood vote - disparity again, discussion of motivation. 

Could these be exposed by accident? Possible but unlikely.

Impact - tending to 2s, but Luna notes coubd be a g/w to getting other tokens, and no way of submitting new jobs.

MD - not often we'd have an "impact" of a 1.  
ML- except single user events

4d - short lived grid job tokens, how CMS currently working. Tokens for payloads could be narrow or not, currently not narrow (flexibility on where to upload). ALICE tokens are super-narrow.

Likelihood - slightly lower, due to the short lived nature of the tokens. 

Impact vote - discuss of storage modify scope in the tokens, which for some of us counteracts the benefits of shorter lifetimes.

Reminder that we need to get the mitigations really implimented.

Make sure low hanging fruits get implimented

4e - user data management, the equivilent of voms-proxy-init. 

Discussion of power users.

Likelihood - quite high

Production manager/power user might need to be spun out.

Impact quite high - taking the worse case scenario.

Could suggest curtailing power users in the future, e.g. 

4f - Job submission  
Also has the power user question.

Any user could do a lot of damage with DDOS attack or similar

High impact again, some discussion. 

Finished after 130 minutes

## AOB/Next Meeting
Do we want a "bonus" meeting in November (targetting ~11th)?  - Yes Matt will send 

Note EGI CSIRT F2F 13th/14th Nov, where Matt aims to present our progress.

Do we want to submit something for CHEP2026? Will need a volunteer to present for the TTT. Abstract deadline 19th December. Don't need an answer today, will discuss over November's meetings.

Next regular meeting date Tuesday 25th November at 15.00 CET.

There are minutes attached to this event. Show them.
    • 3:00 PM 3:05 PM
      Actions, Since Last Meeting 5m
    • 3:05 PM 3:30 PM
      Discussion: Risk Analysis 25m

      Inspiration may be taken from these assessments from EGEE and WLCG done many years ago:

      Work through the Workflows added by Maarten to the document, and review the scoring methodology.

      Continue discussion from the list.

    • 3:30 PM 3:55 PM
      Discussion 25m

      Probably just continuing the above.

      https://github.com/TTT-WG/TTT-WG/issues

    • 3:55 PM 4:00 PM
      AOB, next meeting 5m

      Early November meeting?

      Regular November date would be the 25th