WLCG Traceability and Isolation WG (Vidyo meeting)

Europe/Zurich
31/S-027 (CERN)

31/S-027

CERN

10
Show room on map

Present: Alessandra Forti (starting from 16:20, during CMS update), Brian Paul Bockelman, Dave Dykstra, David Crooks, Ian Neilson (left before AOB), Maarten Litmaath, Miguel Martinez Pedreira, Vincent Brillault


● Welcome and minutes from last meeting

No comment on the minutes


● Traceability challenges

Presented Slides:

  • Comment on the slides: CMS did perform 1-2 internal self-assessment
  • In the past, such challenges (e.g. with ATLAS) required credentials to be given to the testers (to fiddle around), but this would not work it (submitter has to be unknown)
  • Simpler alternative: pick Worker Nodes at CERN running user jobs from each VO and report these IP/time to the VO
    • Action on Vincent: Check feasibility of such test and, if possible, run it

Update from CMS:

  • Fermilab wants to keep local logging capabilities from GLExec (users)
  • Existing related feature in Condor:
    • Job environment contains in a variable the endpoint for reporting information back to Condor: the pilot can be configured to push information there
    • Condor can be queried to have the data exposed at the running time (snapshot of "now", no past)
  • Using this feature would still require development, currently ongoing:
    • Configure the pilot to send more information (e.g. user) currently not pushed
    • Add a Condor plugin to store historical information
  • Mostly a Proof of Concept for Fermilab at this point, but
    • Could be merged into Condor, enabled by default (review still needed, e.g. on the size or sensitivity of the logs) and thus pushed to other site
    • For non Condor site that would like to keep the functionality, a service with a compatible API would be needed
  • In a similar fashion, it could be possible to keep local blacklisting capabilities with minimal optional services queried by the pilot (if required by a site)

 


● Singularity update

  • CERN has started a security review of Singularity but cannot provide guarantees on the time scale or the completeness of said review
  • RedHat seems to be more proactive for unprivileged user namespace is 7.4 as seen in http://seclists.org/oss-sec/2017/q2/11:
As for unprivileged user namespaces, they were considered too
insecure up to and including RHEL-7.3 to be enabled by default.
There are plans to enable them (by sysctl parameter) in RHEL-7.4.
  • CMS update:
    • Singularity rolled out in 3 CMS sites in production for already 2 weeks (not yet at 100%, but expected to reach 100% within a week)
    • Only one issue identified so far: If auto-fs restarts and try to restart cvmfs, cvmfs can't restart until all running pilots are drained and stop using cvmfs. Discussion ongoing with the developer (no bug open yet) but seems hard to fix.
      • Configuration systems, e.g. puppet, could trigger such restart and thus kill WNs..
  • Long term support for Singularity?
    • It's not a grid project (from HPC)
    • There is one paid developer, working for the US DOE: no crystal ball for the future...
    • Currently at least 4 people with commit access (incl. Brian)
    • Under a spike of popularity and commit activity right now
    • Seen as the way for containers at HPC sites in the US (no docker)
    • Experience also existing in Europe:
      • GSI has quite some experience, including in development
      • Already deployed in some site (e.g. SiGNET and ARNES for host/pilot isolation)
    • When SUID is dropped (unprivileged user namespace support in RHEL), development will not be security-critical as Singularity would mostly be a wrapper over unprivileged namespace APIs

● Action review, AOB and next meeting

  • GDB session on Containers on April 12th afternoon: https://indico.cern.ch/event/578985/
  • Data access authentication/authorization/traceability: Brian has submitted a NSF project, answer in the next months
There are minutes attached to this event. Show them.
    • 16:00 16:05
      Welcome and minutes from last meeting 5m

      See https://indico.cern.ch/event/610915/note/

      Speaker: Vincent Brillault (CERN)

      No comment on the minutes

    • 16:05 16:20
      Traceability challenges 15m
      Speaker: Vincent Brillault (CERN)

      Presented Slides:

      • Comment on the slides: CMS did perform 1-2 internal self-assessment
      • In the past, such challenges (e.g. with ATLAS) required credentials to be given to the testers (to fiddle around), but this would not work it (submitter has to be unknown)
      • Simpler alternative: pick Worker Nodes at CERN running user jobs from each VO and report these IP/time to the VO
        • Action on Vincent: Check feasibility of such test and, if possible, run it

      Update from CMS:

      • Fermilab wants to keep local logging capabilities from GLExec (users)
      • Existing related feature in Condor:
        • Job environment contains in a variable the endpoint for reporting information back to Condor: the pilot can be configured to push information there
        • Condor can be queried to have the data exposed at the running time (snapshot of "now", no past)
      • Using this feature would still require development, currently ongoing:
        • Configure the pilot to send more information (e.g. user) currently not pushed
        • Add a Condor plugin to store historical information
      • Mostly a Proof of Concept for Fermilab at this point, but
        • Could be merged into Condor, enabled by default (review still needed, e.g. on the size or sensitivity of the logs) and thus pushed to other site
        • For non Condor site that would like to keep the functionality, a service with a compatible API would be needed
      • In a similar fashion, it could be possible to keep local blacklisting capabilities with minimal optional services queried by the pilot (if required by a site)
    • 16:20 16:35
      Singularity update 15m
      • CERN has started a security review of Singularity but cannot provide guarantees on the time scale or the completeness of said review
      • RedHat seems to be more proactive for unprivileged user namespace is 7.4 as seen in http://seclists.org/oss-sec/2017/q2/11:
      As for unprivileged user namespaces, they were considered too
      insecure up to and including RHEL-7.3 to be enabled by default.
      There are plans to enable them (by sysctl parameter) in RHEL-7.4.
      • CMS update:
        • Singularity rolled out in 3 CMS sites in production for already 2 weeks (not yet at 100%, but expected to reach 100% within a week)
        • Only one issue identified so far: If auto-fs restarts and try to restart cvmfs, cvmfs can't restart until all running pilots are drained and stop using cvmfs. Discussion ongoing with the developer (no bug open yet) but seems hard to fix.
          • Configuration systems, e.g. puppet, could trigger such restart and thus kill WNs..
      • Long term support for Singularity?
        • It's not a grid project (from HPC)
        • There is one paid developer, working for the US DOE: no crystal ball for the future...
        • Currently at least 4 people with commit access (incl. Brian)
        • Under a spike of popularity and commit activity right now
        • Seen as the way for containers at HPC sites in the US (no docker)
        • Experience also existing in Europe:
          • GSI has quite some experience, including in development
          • Already deployed in some site (e.g. SiGNET and ARNES for host/pilot isolation)
        • When SUID is dropped (unprivileged user namespace support in RHEL), development will not be security-critical as Singularity would mostly be a wrapper over unprivileged namespace APIs
    • 16:50 17:00
      Action review, AOB and next meeting 10m
      Speaker: Vincent Brillault (CERN)
      • GDB session on Containers on April 12th afternoon: https://indico.cern.ch/event/578985/
      • Data access authentication/authorization/traceability: Brian has submitted a NSF project, answer in the next months