Token Trust & Traceability WG

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Description

Fortnightly for the risk assessment season.

Zoom Meeting ID
64974356171
Host
Matthew Steven Doidge
Useful links
Join via phone
Zoom URL

https://codimd.web.cern.ch/9Ezh7kJyQI6PsMp1iXl87g?view

 

# Token Trust & Traceability 10/2/24

Attending: Matt, Maarten, Linda, DaveD
Apologies: Luna, TomD, DavidC, DaveK , Mischa , Donald

Note - clash with TIIME this week


## Actions, Last Meeting
Actions taken on Matt, Luna, Maarten to attempt merging rows and compressing and streamlining the spreadsheet.

Matt reports no success on this, with not much more then simple line merging.

Luna however has had much success with a refactoring (his proposal is pasted below)

 

## Discussing the new version of the spreadsheet

* Short term/long term token distinctions removed.
* Asset lists shouldn't be a primary key anymore - should be threats instead.
    * Assets are an informational column
    * Hard to contain threats to a single asset

Swap Threat and Asset column

Some discussion of TR-1, 2 and 3 - looks okay but have to see how they sit in the new paradigm. TR-1 seems fine as is.

ML - mustn't try to capture all workflows in all categories. 

Note shuffling of TR 6/7 to TR 4/5. Some wondering about ordering for the future.

TR6 - new FTS columns combined.

Note that in the new paradigm column D (Workflow) and I (Mitigation/Control), so these might not be relevant anymore - at least needing a rephrasing of the content

Matt - Perhaps workflow column name doesn't work anymore 

ML - useful to keep same scheme for different use cases. Note that the workflows have mitigations in the description that isn't mentioned in the column.  
Not much to worry about here.

Remember that the spreadsheet is a tool to get to the document.

Note that the mitigations are not listed.

- so to do, rename columns
- FTS merger looks okay

Looking at the Impact/Liklihood - suggestion for mergers was take worse case scenario

Looking at FTS note that 

In document can add narrative describing the mitigation strategies and reasoning.


Discussion of merging Misuse/Token Reply - no objections, a stolen token and misued token can be treated the same

ML - distinction driven by asset column, so benefits from not being tied by assett

For example in the new TR-7 Can look as "threats coming from grid jobs".

Double checking the "numbers" happy with what we're seeing.

ML - Reminder of cost of mitigation stategies

TR8-11
ML - Ordinary users/power user split accepted, but should consider if we need to split data access and jobs.   
But again workflows are quite different.

Matt does wonder if for ordinary users need a split between job submission and data access.

ALso the border between ordinary and power user sketchy. 

ML - even an ordinary user might have more access to their own data (group members).  
This is a good reason to keep job submission and data management seperate.

Matt reconsiders, falling into the trap of considering if the numbers are the same the threts are the same.

ML adds a comment to the list.

Have a revisit to the new TR8 scoring, impact feels low.

Discussion of mitigation - advise minimising number of members in a group that have "power user".

Can firmly define power user with "someone who can delete someone elses data". Can do a better job then current voms.

Looking at overlap between TR-7 and TR-9.

ML - different view point for user based threats.

TR-9 can leak into TR-7. Need to explicitly state assumption that can only submit to task queue - direct submission should be strongly discouraged.

Need to revisit the numbers with a wider group.

Some discussion of high TR9 likelihood, but seems okay.

TR-10, power user data management - in line with the rest of the logic. ML puts a note about a potential solution in a "two-person" or "4-eyes" rule to help mitigate power user issues.

DD notes different mitigations between stolen tokens/identiies and a user going rogue- for example MFA.

Matt notes that insisting on MFA is easier for the smaller number

FInally onto TR-10, power user job submission. Main difference is quantity (fairshare) and priority.

Some talk over the figures, some differences between the original numbers - for example power users have 5 for Impacts, but 1s for Likelihood. Big need for a discussion.

Looks like we're close, one more discussion with a larger number of participants to hammer out numbers and we can start writing the report (and maybe the CHEP submission.)

 

## AOB, Next Meeting
Next meeting in the usual cycle would be 15.00 CET, 24th Feb.

-- seems okay.

# Token risk proposal

## Proposal

Simplify the current risk analysis on risk for tokens. Explanations and mitigations to be merged together.

## Proposal 1: Merge by workflow/user

This is what I hinted last time. Merging TR-4 and TR-5 by actual workflow.
I added a new sheet in the current working document

We combine the short and long lived (mitigations can be explained), and the reason behind "Theft" + "Misuse" we keep the highest score.

As the asset changes as we mix now the identity and the resource and transform it to what you can do with the token I sugggest to completely eliminate the asset column and just keep the threats as is.

Having: Ordinary Users, Power Users, Automated workflows
And mixing it with the resources: Data, Compute

This will combine TR4 and TR5 in six lines (actual names are in the worksheet):

- TR-6 Data token from workflow: combines TR-4a, TR-4b, TR-5a, TR-5b
- TR-7 Compute token from workflow: combines TR-4c, TR-4d, TR-5c, TR-5d
- TR-8 Data token from ordinary user: TR-4e-O, TR-5e-O
- TR-9 Compute token from ordinary user: TR-4f-O, TR-5f-O
    - Risk max is 8, but I suggest to take the one on TR-4f-O (to 6)
- TR-10 Data token from power user: TR-4e-P, TR-5e-P
- TR-11 Compute token from power user: TR-4f-P, TR-5f-P


## Discarded proposal 2: 

Keep "Theft" vs "Misuse" (so tr4/tr5 distinction) but remove internal distinctions between "long-lived" vs. "short-lived" tokens (e.g., merging a/b and c/d). As the core mitigation for both is to minimize lifetimes and scopes. Lines were really the same in explanation so does not offer big advantage to do this distinction.


## Discarded proposal 3: Theft vs Misuse

Combine the lines on Theft and the ones on Misuse (so TR-5 and TR-6), but this was the starting point and we discarded it because we thought it was insufficient and highlighting the wrong differences while some of the mitigations just repeat between the perspective on 5 and 6.


## Addendum

I would like that we might want to explore a new risk at some point when using oauth from the user side/browser, highlighted by Mischa (it is in reality workflows and token exchange but here is shadowed by user consent if we ever allow third party apps that would automatically do token exchange with user consent). Mitigation for now would be that applications they should be vetted and duration of the refresh token limited. And for now we can consider that this risk is akin to TR-3


## References

- [1] WLCG Computer Security Risk Analysis
  - <https://indico.cern.ch/event/394780/contributions/1832624/attachments/1239210/1821442/WLCG_Risk_Assessment.pdf>

- [2] EGEE-III Overall Security Risk Assessment Strategy
  - https://edms.cern.ch/ui/file/1039446/1/EGEE-III-SCG-TEC-1039446-SecurityRisk-v0_9.pdf 
  - https://edms.cern.ch/file/1039446/1/OverallGridSecurityRA-fulldata-rev250309.xls  
- [3] WLCG Risk register
  - https://wlcg.web.cern.ch/risk-register


## Other interesting references

- LCG risk analysis: https://proj-lcg-security.web.cern.ch/RiskAnalysis/risk.html

- ERMAC: https://edms.cern.ch/ui/file/2245136/5/ERMFrameworkUpdatedMay2025.pdf

- Wise Templates: https://wise-community.org/risk-assessment-template/

- Presentation WLCG Risk Analysis: https://indico.cern.ch/event/169042/contributions/262182/attachments/208531/292439/WLCG_risk_analysis_MB.pdf
- Presentation on EGEE-III Risk Assessment: https://indico.cern.ch/event/293705/contributions/1653270/attachments/550271/758274/EGI-TF12-SecurityThreatRiskAssessment-LAC-2.pdf

There are minutes attached to this event. Show them.
    • 15:00 15:05
      Actions, Since Last Meeting 5m
    • 15:05 15:30
      Discussion: Risk Analysis 25m

      Inspiration may be taken from these assessments from EGEE and WLCG done many years ago:

      Work through the Workflows added by Maarten to the document, and review the scoring methodology.

      Continue discussion from the list.

    • 15:30 15:55
      Discussion 25m

      Probably just continuing the above.

      https://github.com/TTT-WG/TTT-WG/issues

    • 15:55 16:00
      AOB, next meeting 5m

      Next regular meeting date would be the 24th of February at 15.00 CET.