Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

WLCG Traceability and Isolation WG (Vidyo meeting)

Europe/Zurich
31/S-027 (CERN)

31/S-027

CERN

10
Show room on map
Vincent Brillault (CERN)

Present: Alessandra Forti, Antonio Maria Perez Yzquierdo (Joined at 16:52 at the end of the Singularity discusson), Dave Dykstra, Ian Neilson  Maarten Litmaath, Miguel Martinez Pedreira, Vincent Brillault


● Welcome and minutes from last meeting

No comments on the previous notes


● Traceability challenges

  • Traceability Challenge (see slides):
    • All VOs replied with various delays (mostly communication issues)
    • Issues/improvement identified and to be followed up
    • Challenge problem: impossible to verify if VO identified the right job
      • It might be possible to identify user/payload from artefacts from the running environment
      • A new challenge should be scheduled in late automn with this information
  • VO framework & local submission:
    • Alice: All should go through central but local submission still exists (users are on their own)
    • ATLAS: Same situation (local submission not forbidden), situation might be more complex for small T3 with special configuration.
    • The situation seems to vary case-by-case between sites (esp. T3 or lightweight sites)
    • Singularity integration in pilot jobs might unify usage (controlled environment, no pool account mapping, etc)
      • The situation should be re-evaluated in several months/a year
  • HTCondor CE 8.6.3 contains the necessary patch to implement the traceability mentioned at the last meeting (pilot pushing information to the CE which can then log it). Details on how to use: https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6213
  • Traceability procedures within CMS: https://twiki.cern.ch/twiki/bin/view/Main/CmsTraceability

● Singularity update

  • The main developer of Singularity has created a company, SingularityWare, LLC: http://singularity.lbl.gov/2017-singularity-llc
    • The future licensing scheme or support is unclear
    • It would be great to obtain an official statement, as we are currently planning to use Singularity as a building block for our infrastructure
  • Singularity has been deployed at CERN: all LXPLUS + HTCondor batch system
  • A new WLCG WG/TF is currently being created in order to coordinate the deployment and operation of Singularity on sites. All operational issues should be discussed at that new WG/TF rather than here.
  • Atlas does not agree with the current deployment scheme used by CMS to run Singularity, especially concerning which bind-mounts to use. The WG has agreed that this need to be agreed upon between VOs/Site, but the new WG/TF would be a better place for this.

● Actions, next meeting & WLCG Workshop

New actions:

  • Identify, per VO, artefacts that can be used to identify real payload/user from running job
  • Redo challenges in late autumn, using artefacts mentioned before to verify that each job was properly identified
  • Contact SingularityWare, LLC to confirm future licensing scheme for Singularity

 

Next meeting:

  • no meeting next month (WLCG Workshop) or during summer.
  • Foodle to be sent in (late) August to try to find a meeting in September/October
There are minutes attached to this event. Show them.
    • 16:00 16:05
      Welcome and minutes from last meeting 5m
      Speaker: Vincent Brillault (CERN)

      No comments on the previous notes

    • 16:05 16:30
      Traceability challenges 25m

      Debriefing of the traceability challenges and next steps

      Speaker: Vincent Brillault (CERN)
      • Traceability Challenge (see slides):
        • All VOs replied with various delays (mostly communication issues)
        • Issues/improvement identified and to be followed up
        • Challenge problem: impossible to verify if VO identified the right job
          • It might be possible to identify user/payload from artefacts from the running environment
          • A new challenge should be scheduled in late automn with this information
      • VO framework & local submission:
        • Alice: All should go through central but local submission still exists (users are on their own)
        • ATLAS: Same situation (local submission not forbidden), situation might be more complex for small T3 with special configuration.
        • The situation seems to vary case-by-case between sites (esp. T3 or lightweight sites)
        • Singularity integration in pilot jobs might unify usage (controlled environment, no pool account mapping, etc)
          • The situation should be re-evaluated in several months/a year
      • HTCondor CE 8.6.3 contains the necessary patch to implement the traceability mentioned at the last meeting (pilot pushing information to the CE which can then log it). Details on how to use: https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=6213
      • Traceability procedures within CMS: https://twiki.cern.ch/twiki/bin/view/Main/CmsTraceability
    • 16:30 16:50
      Singularity update 20m
      • The main developer of Singularity has created a company, SingularityWare, LLC: http://singularity.lbl.gov/2017-singularity-llc
        • The future licensing scheme or support is unclear
        • It would be great to obtain an official statement, as we are currently planning to use Singularity as a building block for our infrastructure
      • Singularity has been deployed at CERN: all LXPLUS + HTCondor batch system
      • A new WLCG WG/TF is currently being created in order to coordinate the deployment and operation of Singularity on sites. All operational issues should be discussed at that new WG/TF rather than here.
      • Atlas does not agree with the current deployment scheme used by CMS to run Singularity, especially concerning which bind-mounts to use. The WG has agreed that this need to be agreed upon between VOs/Site, but the new WG/TF would be a better place for this.
    • 16:50 17:00
      Actions, next meeting & WLCG Workshop 10m

      New actions:

      • Identify, per VO, artefacts that can be used to identify real payload/user from running job
      • Redo challenges in late autumn, using artefacts mentioned before to verify that each job was properly identified
      • Contact SingularityWare, LLC to confirm future licensing scheme for Singularity

       

      Next meeting:

      • no meeting next month (WLCG Workshop) or during summer.
      • Foodle to be sent in (late) August to try to find a meeting in September/October