WLCG Traceability and Isolation WG (Vidyo meeting)

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

15
Show room on map

Present: AlessandraF, AndrejF, DaveD, DavidC, MaartenL, MischaS, VincentB

Current draft discussions

The current draft was discussed in detail. Up to the 2nd Scenario:

  • The new structure, with technology-agnostic recommendations which are then explained in various scenarios
  • Recommendations for Isolation:
    • Care for sensitive data (esp personal data) has been added as a SHOULD. This is deemed not to be the core of the policy, but is worth suggesting...
    • No other concern were raised.
  • Using containers to isolate user payloads from VO pilots:
    • Concern with regard to IPC namespaces were raised, in particular due to MPI jobs. In order to avoid future failures and incompatibilities, IPC namespaces have been made a SHOULD. VOs are still highly encouraged to use them wherever possible
    • No other concern were raised
  • Using VO pilots without VO framework capabilities:
    • The main use case for this scenario is sites that are using containers as part of their batch system, to provide the pilot job environment, as nested container do not seem to work in most cases.
    • A lot of concern were raised about the pilot credentials:
      • Removing the credentials from the file system is not an option [Explanation from after the meeting: Python requires files to established TLS connections with client certificates. As a result the pilot would need to write them to disk from time to time, leaving it open to theft from a malicious payload.]
      • Vincent suggested that the pilot could obtain more restricted credentials, which could be forbidden from taking user payloads of other jobs. Maarten noted that Dirac pilots are already able to dedicate themselves to a single user, however Andrej, having looked at the code, pointed out that while this is indeed implemented in the pilot (as in Panda), the credentials are not changed and could still be used by malicious payloads (as in Panda)
      • Large HPC systems, which are likely to be in this scenario, are not directly affected by this discussion, as they don't interact directly with Panda, but via the Arc Control Tower, using a different mechanism.
      • Vincent suggested a transition period, depending on the scale of such scenario: If the impact is limited (limited number of sites & resources), due to the cost associated with implementing this credential restriction, the exposure due these credentials not being restricted tolerated could be tolerated (to be defined/formalized), until the next authorization system is designed/implemented (per framework)
    • Unfortunately, due to time constrains, the meeting was adjourned, leaving this discussion without a conclusion
  • All proposed modifications (proposal changes by Vincent suggested in November and all minor changes proposed by participant, if those didn't raise comments) were either accepted or agreed to be accepted after the meeting [Note: Done]

Next steps

  • A new meeting is to be scheduled (Action on Vincent to schedule it), not before the next two weeks to allow people to read the new draft, share it with their peers and comment on it
  • VOs are asked to compare the proposed scenarios with their own sites and workflows:
    • Identify the proportion of sites & resources falling in the proposed scenarios
    • For any site or resource not falling in any scenarios:
      • Identify what change would be needed to an existing scenario
      • Identify if a new scenario abiding to the recommendation could be created
There are minutes attached to this event. Show them.