Token Trust & Traceability WG

Europe/Zurich
Description

Fortnightly for the risk assessment season.

Zoom Meeting ID
64974356171
Host
Matthew Steven Doidge
Useful links
Join via phone
Zoom URL

https://codimd.web.cern.ch/KLZNGBJ-T5WRpN2DuNE-MA

 

# TTT Meeting 26th August 2025

Attending: Matt, Maarten, Luna, Giovanni, DavidC, DaveD

Apologies: DaveK, Linda, Mischa


## Continuing from before

### Review for Luna and Giovanni
- Giovanni introduces themself, working from rucio, previously from escape project.
- Interest in information in this area.
- ML describes our goals.
- Discussion of the documents

### Scoring
* A lot of focus on this last time, note another attempt with more rows.
    - advice from previously was to apply impact score entries for "1, 3 and 5", but in practice found that "1 and 2" are the hardest to define - for some events there are no minor impacts.
    - As a thought experiment I added a second take at a table, with a much more generic, broad definition for the areas of impact - unsure if it's useful, but it might inform our reasoning.

ML - Impact can affect multiple areas. Would take the worse from the matrix.

Looking at altenative matrix

ML - we probably don't need to look at reputational damage for our riskI

Luna - need to make sure that the columns/rows are compatable and made to a similar score.

Some bonuses to the meta table, perhaps for the appendix?

ML - reminder that risk assessment is not a hard science, will discuss an differences we all come to in our assessements, and hammer out a consensus, work on mitigatiosn.

### Workflows and Risks
Discussion of potential mitigations

ML - reformated spreadsheet, noticed "Current Mitigations" was empty. ML added in some.
This is a work in process.

Reflects that this is all somewhat new to us.

ML notes that had to refer to the workflows for some of the mitigations, as workflow dependent.

MD - do we want a sheet for every workflow?
ML - TR-1, for example, wouldn't want to be replicated, but the others are workflow dependent?

Luna - good to explore how much we can summarise, we can later look at all findings, and put back what is universal.

ML - no one size fits all.
Not all workflows are equal, and mitgations won't work for all.
Our risk assessment should make clear that we can look up risks/mitgations for a specific workflow. Needs to be easy to read from assessment.

ML - duplicate Risks spreadsheet N times (N=6 at the moment),some might be orthogonal to workflow.

Having a go at a sample risk analysis on WF-3, adding in a different sheet.

Some discussion of if some workflows are more susceptable to TR-1 then others (they're likely not)

ML - AAI provider *not* just IAM, could be Vault or other cental service.
Pure uploading not concerned.
Some discussion of rucio renaming - this is a storage.create property, not storage.modify.
Of course could rename everything with a stolen token.
Can't use the token to do a list.

Some discussion of TR-1, should be considered individually, but most answers could be copy and paste

Need to have a comment column for implications.

Luna - TR-1, 2, 3 are all very similar

ML - notes the FTS request to block up-scoping. So TR-3 more considered for FTS then Grid-jobs.

TR-2
MD - jobs are possibly more

Likely a generic risk. More about a correct security procedure.

Luna - make a note that this applies to any service, not sure IAM

TR-3
How concerned are we about a bad actor - this is somewhat per workflow (in terms of impact).
Need to be sure that IAM Issue 1020 is took into account.

TR-4
Definitely for considering for each of the 6 (current) workflows.

Generic question - (in a workflow) how concerned are we about stolen tokens?

Aim to have some actionable items - as with the current issue 1020.

Luna - Notes the list of "potential" mitigations that haven't been implimented it

Can write the potential mitigations in terms of threats. 

ML - might need sub-rows as would have many threats.  
Mitigation might just be "don't to that."

Luna - not sure of the best way.

ML - Lots of handles, tuning. Not every experiement uses handles in the same way.  
But at the moment even, say, PanDA is a moving target.

In description of the workflows some considerations have been included - risks might "just be average".

ML  - Ideally used in description as for LHCB jobs have the ability to clean up failed job, so need storage.modify.

Luna- mitigation there is extra controls on the path or similar.  
Mitgations on the things that we see happening

ML - encourages people to add to it, or make changes. Can add to what we have, put in extra sheet- we have back ups.

TR 5,6,7 "more of the same"

Pure VO perspective, but also the site/service admin respective. Some discussion of VOboxes - audience can provide a protection.
Thought on CPU, Data, Network abuse.

Discussion of rogue jobs. User either "being bad" or "being hacked".

Luna - all pilot frameworks have ability to block a user.

Some discussion of direct submissions.

Can we block "regular" users from submitting? 

Talk about blocking users in IAM/frameworks.

All covered.

Likelihood and Mitigations are different between flows.

TR-6
Luna - pilot frameworks should also be auditing this.  
ML - in certain places logging is not quite sufficient.  
Not heard of complaints with CERN IAM.

Discussion of xrootd redaction in logs.  
ML- useful would be dump the contents of the tokens.

TR-7
ML- Assuming that things are configured in a sensible way. Not enough negative tests being done (CMS are doing this) - need more of that.

Tokens are by-design less potentially damaging.

New area for many admins, so need for recommendations.

Could be N/A for soem workflows, but different perspective for a site.

Useful exercise.

Getting there on this.

Discussion on each row, action on Matt to copy some notes from above into the relevant comment box.

``Goal: Produce a versioned (even 0.1) form of these documents by the end of September.``


## AOB, Next Meeting

Another extra meeting in the w/c 8th September?  

Okay to postpone the "regular" September meeting to the 30th (from the 23rd)? Matt may not be available on the regular date.

Some discussion of times.

9th is possibly an OTF, might be able to do the 16th.

There are minutes attached to this event. Show them.
    • 1
      Actions, Since Last Meeting
    • 2
      Discussion: Risk Analysis

      Inspiration may be taken from these assessments from EGEE and WLCG done many years ago:

      Work through the Workflows added by Maarten to the document, and review the scoring methodology.

      Continue discussion from the list.

    • 3
      Discussion

      Probably just continuing the above.

      https://github.com/TTT-WG/TTT-WG/issues

    • 4
      AOB, next meeting

      Extra meeting again in the first half of September?