Token Trust & Traceability WG

Europe/Zurich
Description

Fortnightly for the risk assessment season.

Zoom Meeting ID
64974356171
Host
Matthew Steven Doidge
Useful links
Join via phone
Zoom URL

https://codimd.web.cern.ch/s/aYZrRbC7m#

 

# TTT 22nd July 2025

Attending: Matt, DaveK, Maarten, Mischa, Linda, Hannah, Luna, Marcus   
Apologies: DavidC  

### Continuing work from last time

### Review Scoring
- Comments from the doc/list:
    - Is likelihood really the same as how often something happens?
    - For the SVG we typically rate DoS as at most moderate, but it depends also on the duration/type of incident/type of service. 
    - (Remote) Code Execution might be a good one to add.

Shall we do a 90 degree rotation of the table to allow more space in the guideline section for examples?

Linda changed SVG guidelines, wide spread disruption.

Needs the likelihood thing needs spelling out, as really frequency.
ML - should keep using likelihood. But add 
Probability in unit of time (about likelihood)

MS - in SVG impact on service, or in infrastructure. Example of DDOS  
ML - often can't do anything about potential DDOS. Is a problem that will always be there, so doesn't really help us. So not helpful to rate as highly. Gives a tilted image of the risk phase space. But we should at least recognise that central service could be a particular target for this, and a project needs to decide to invest in this.   
MS - is the risk for the infra for the service or for the users?  
Stronger motivation to protect the infra.

MD- suggest using "worst case" scoring

Matt will clean up tables and add 

MS - notes remote code execution

MD - notes that any credential loss that has submission rights is a remote code execution

ML - turn table into a matrix, separate out data/compute, and availability/infra. Also impact on reputation.

Luna - CERN split impact into 3 areas, financial, reputations, scientific  
ML - not convinced that need to worry so specifically.  
ML - disruption of service can be an issue for users, but all players should be able to recognise themselves in the assessment.

Some getting into the weeds.  
Luna - covered in WLCG risks.  
ML - Know this will get better with tokens. Should mention that abuse through stolen compute tokens very unlikely.
Need to mention that token workflows "self-mitigate"  
biggest impact for a site is that resources get abused to spread or attack another.  
Doesn't need a lot of text in our assessment, so the rest can focus on data.  

 

### Review Workflows
- Questions/Comments from the list
    - comment that workflows are WLCG centric, some discussion of scope
    -  impact of unavailability of the token issuer based on how long it is down
    -  should be considered, but treated orthogonal to token abuse

These workflows described in detail in the document, the spreadsheet just a starting point.  
These are obviously WLCG(-like) VOs.  
CMS for example also has web services, that would use more tokens, but we probably don't need to worry about that.  
Rucio implied in FTS sections  

Two types of users - "ordinary" and production managers. Any protections would be very needful for them.  
Tokens can be used to prevent powerful credentials ending up where they're not really needed.

Time to think on this whilst we rearrange scoring tables.

HS- are we going to assume systems are grid specific? Low risk anyway.  
Other is dynamic registration? oidc-agent, that sort of thing.  
ML - everybody has web services, dynamic clients are important in the token machinery.  
Please add (maybe a 3rd sheet)

DK - managing the VO admin/management roles - "privileged users"  
Also interactions with other users.

Interactions with IAM should be be minimal (sign up, register groups, sign AUP yearly).

Making some assumptions that IAM is run properly. Recommendations and guidelines.   
Whether we need analysis on this is to be seen.  

Document will have mitigation, guidelines etc for admins, managers.  
Assumption that use of tooling is written down.  

Using command-line tools need tokens on a machine, so that's something to point out. In some ways browser might be "safer".   Maybe...

Thinking about web services, they could have scientific/personal data/reputational issues.

Talk of major IAM security fixes in the past.


HS - one of the things out of this if a token issuer is down?  
ML - this WG is how thing go wrong if a token is misused. The WLCG token TF is using this document for proper use, and adding operational concern from VO perspective. Can feature here, is a risk, but also there for VOMS, should we really mention it here?

DK - where is the risk for dynamic clients?  
ML - dynamic clients can only be registered by those in VO, this was a change made to IAM.  

Discussion on this (moved a bit fast to minute).

DK - in this assessments should include rogue clients.

- Make sure rogue clients included in the list of Threats
    - Can be included in privilege escalation
 
Tokens have 3 ways of providing privilege, move away from using groups, restrict to subjects and scopes.  
Need to mention this.

Will be difficult filling in risk assessment, and see how painful it might be.

Pull in a few people from the experiments once we get started to add context.

Need another asset for this?

ML - just thought of key rotation that could be put into the document.

### AOB/next meeting
"Bonus" slot on Tuesday 5th of August or late PM or Friday 8th of August? Or another time?  
 - will discuss on list, but definitely want a bonus meeting.

Next "regular" slot Tuesday 26th of August (propose that we don't change this).

Aims laid out for next meeting:
- Matt to fix the impact/likelihood tables
- review assets/workflows
- next meeting discuss assumptions, goal to be in a position to be able to attempt assessments by the end of it.

 


Chat:

    https://confluence.egi.eu/display/EGIBG/Notes+on+Risk  
    3 pillars -confidentiality, Integrity, availability

There are minutes attached to this event. Show them.
    • 15:00 15:05
      Actions, Since Last Meeting 5m
    • 15:05 15:30
      Discussion: Risk Analysis 25m

      Inspiration may be taken from these assessments from EGEE and WLCG done many years ago:

      Work through the Workflows added by Maarten to the document, and review the scoring methodology.

      Continue discussion from the list.

    • 15:30 15:55
      Discussion 25m

      Probably just continuing the above.

      https://github.com/TTT-WG/TTT-WG/issues

    • 15:55 16:00
      AOB, next meeting 5m

      Arrange time for next "bonus" meeting.