WLCG Traceability and Isolation WG (Vidyo meeting)

Europe/Zurich
31/S-028 (CERN)

31/S-028

CERN

30
Show room on map

Present: Alessandra Forti, Brian Paul Bockelman, Claudio Grandi, Dave Dykstra, David Crooks, Maarten Litmaath, Mischa Sallé, Vincent Brillault


● Previous meeting minutes

No comment raised on the previous notes


● Singularity update

  • Singularity 2.2 released, available in OSG development releases
    • No progress (no time) made in having latest version in EPEL (current maintainer doesn't seem active)
    • In the worst case, the package could be added to the WLCG repo
  • Singularity added to CernVM 4 Beta: Tests welcome!

  • Test HTCondor cluster deployed at CERN (details to follow on mailing list)

  • SUID & Security review:

    • A team in Indiana might be able to do a security review (not kickstarted yet)

    • Compared to GLExec, there are features in Singularity that might make sites want to deploy it for themselves (e.g. for 2 US CMS sites, Singularity had already been deployed due to other users requirements)

  • OSG is integrating Singularity into GlideinWMS:

    • Currently enabled only for OSG Staff

    • Used to debug workflows

    • CMS might require sites that want to have RHEL/Centos/SL 7 only worker nodes (not exposing a SL6 environment to the pilot)  to support Singularity (allowing the pilot job to expose a SL6 environment to the payload)


● Traceability model & pilot workflow

  • Vincent presented to possible models to kickstart the discussion:
    • User credentials pushed to the job (as with GLExec)
    • Pilot doing storage operation on behalf of the job
  • The second option was not deemed realistic by members of the WG:
    • This adds a lot of constrains on the user job, the pilot job might not know which files are needed
    • This means no streaming/direct processing of the data, all need to be downloaded on disk
    • No check point possible
  • Maarten noted that with the experience we had in security incidents, we might be able to lower a bit the requirements, as the risk doesn't seem so high...
  • Two other designs were shortly discussed:
    • Each job receive with a unique non-guessable path on which the job is supposed to write. That path should be open to write for the time window on which the job is supposed to write
    • The payload announce in its job description all file operation needed. The pilot can then sign access tokens given to the payload for each access (the storage needs to be aware of this mechanism)
  • Agreement was reached that the WG should collect all VO data workflows, to understand better the situation, before discussing any model that could fit them

● Open and new actions

  • New Actions:
    • VO: prepare presentation of their data workflows in pilot jobs & payloads
    • Vincent: Give access details for CERN HTCondor testing cluster
  • Existing actions:
    • 20160914-01: No progress
    • 20160914-02: Ongoing: CMS is testing it, no feedback from other VOs
    • 20161123-01: Ongoing: Vincent got a negative feedback from EGI CSIRT, Maarteen could try to see if possible to revive some previous collaboration
    • 20161123-02: Ongoing: Effort/Team identified, not kickstarted yet
    • 20161123-03: Closed: HTCondor testing cluster deployed at CERN Site
    • 20161123-04: Closed: Added to CernVM 4 Beta
    • 20161123-05: Ongoing

● AOB & next meeting

Next meeting: Wednesday 1 Mar 2017, 16:00 → 17:30

There are minutes attached to this event. Show them.
    • 1
      Previous meeting minutes

      See https://indico.cern.ch/event/586323/note/

      Speaker: Vincent Brillault (CERN)

      No comment raised on the previous notes

    • 2
      Singularity update
      • Singularity 2.2 released, available in OSG development releases
        • No progress (no time) made in having latest version in EPEL (current maintainer doesn't seem active)
        • In the worst case, the package could be added to the WLCG repo
      • Singularity added to CernVM 4 Beta: Tests welcome!

      • Test HTCondor cluster deployed at CERN (details to follow on mailing list)

      • SUID & Security review:

        • A team in Indiana might be able to do a security review (not kickstarted yet)

        • Compared to GLExec, there are features in Singularity that might make sites want to deploy it for themselves (e.g. for 2 US CMS sites, Singularity had already been deployed due to other users requirements)

      • OSG is integrating Singularity into GlideinWMS:

        • Currently enabled only for OSG Staff

        • Used to debug workflows

        • CMS might require sites that want to have RHEL/Centos/SL 7 only worker nodes (not exposing a SL6 environment to the pilot)  to support Singularity (allowing the pilot job to expose a SL6 environment to the payload)

    • 3
      Traceability model & pilot workflow
      • Vincent presented to possible models to kickstart the discussion:
        • User credentials pushed to the job (as with GLExec)
        • Pilot doing storage operation on behalf of the job
      • The second option was not deemed realistic by members of the WG:
        • This adds a lot of constrains on the user job, the pilot job might not know which files are needed
        • This means no streaming/direct processing of the data, all need to be downloaded on disk
        • No check point possible
      • Maarten noted that with the experience we had in security incidents, we might be able to lower a bit the requirements, as the risk doesn't seem so high...
      • Two other designs were shortly discussed:
        • Each job receive a unique non-guessable path on which the job is supposed to write. That path should be open to write for the time window on which the job is supposed to write
        • The payload announce in its job description all file operation needed. The pilot can then sign access tokens given to the payload for each access (the storage needs to be aware of this mechanism)
      • Agreement was reached that the WG should collect all VO data workflows, to understand better the situation, before discussing any model that could fit them
    • 4
      Open and new actions

      Review https://wlcg-traceability-isolation-wg.web.cern.ch/content/ongoing-actions

      • New Actions:
        • VO: prepare presentation of their data workflows in pilot jobs & payloads
        • Vincent: Give access details for CERN HTCondor testing cluster
      • Existing actions:
        • 20160914-01: No progress
        • 20160914-02: Ongoing: CMS is testing it, no feedback from other VOs
        • 20161123-01: Ongoing: Vincent got a negative feedback from EGI CSIRT, Maarteen could try to see if possible to revive some previous collaboration
        • 20161123-02: Ongoing: Effort/Team identified, not kickstarted yet
        • 20161123-03: Closed: HTCondor testing cluster deployed at CERN Site
        • 20161123-04: Closed: Added to CernVM 4 Beta
        • 20161123-05: Ongoing
    • 5
      AOB & next meeting

      Next meeting: Wednesday 1 Mar 2017, 16:00 → 17:30