WLCG Traceability and Isolation WG (Vidyo meeting)

Europe/Zurich
31/S-028 (CERN)

31/S-028

CERN

30
Show room on map
Description
WLCG Traceability and Isolation WG See https://wlcg-traceability-isolation-wg.web.cern.ch/

● Previous meeting minutes

No comments on the minutes of the previous meeting


● VO log recommendations

  • Alice report (only VO to report during the meeting):
    • First document is high level and was already discussed during the meeting: Alice already follows these recommendations
    • Alice keeps all information mentioned in the second document. However, the format used is different and Alice does not use syslog but an internal messaging solution
  • Discussion on log format uniformity:
    • Maarten: If we were to start from scratch, it would be doable. But changing the log format now would be horribly difficult and would cost a lot of efforts
    • Ian: We still should try to avoid duplication of efforts and avoid using different tools and formats
    • CMS can't directly modify its logging format: CMS relies on GlideinWMS, which may be open to suggestion, but not a complete rewrite is impossible
    • Maarten proposed that the VOs could send their data to a central cluster (e.g. maintained by the Security Team) which could process and normalized them.
    • Vincent replied that the CERN Security Team currently can't cope with such data/load
    • Agreement that we might find parts of each VO logging infrastructure that might be enhance, but that we should no try to rebuild a new logging system
  • Self assessment discussion:
    • Vincent asked if it would make sense for VOs to try to self-asses themselves by:
      • Picking a random Pilot Job from the previous week
      • Pick a random job ran during the life time of that pilot job
      • Try to see how expensive the identification of the owner of that job and his activity/payload would be
    • Maarten commented that similar exercise were done in the past, in particular a security challenge ran against panda/ATLAS. Such a real challenge could be very expensive.
    • Ian explained how, within the UK NGI, he was testing the argus deployment by running a job that would query a non existing file on a web server: the webserver logs easily show where the job ran
    • Proposed self-assessment for VOs:
      • Identify what would be the cost and potential bottleneck, given a restricted time period and the IP address of a worker node, of identifying matching jobs and the corresponding user and payload
      • Such an assessment should be taken under the assumption that this WG will succeed in building a complete isolation layer between the pilot and the jobs and should thus ignore pilot-job take-over or hidden daemonized processes scenarios
    • Action: perform the proposed self-assessment

● Update on containers

  • Alice reported working on Docker and Kubernetes deployment. While these do rely on containers, it was agreed that the goal was different from the mandate of this WG, as they rely on special deployment on sites, giving root access to VO.
  • CERN and FNAL opened feature requests for unpriviledged namespace support in RedHat Entreprise Linux 7.3, which is supposed to be released later this year.
  • Vincent proposed a pilot: using a more recent kernel/distrubution to try to build a tool that will use unprivileged namespaces to build a working SL6/7 environment for jobs, in order to test it on RHEL 7.2 or any preview of 7.3 and identify missing pieces. This proposal did not arouse the interest of members of the Working Group
  • Dave proposed another pilot: build glExec plugins relying on privileged namespaces (still a SUID) to create the same kind of isolation (which could also, for example, allow to obtain an SL6 environment on SL7). CMS would be interested on such tool, which could ease the deployment of glExec (not special user or mapping needed) for new sites. In theory, as soon as unprivileged namespaces were supported, adpating such plugin to run as a normal user should be rather straightforward
  • Maarten commented that it may be worth it as long as it does not consume too much effort and investment
  • No agreement was reach on this topic, leading to no action before the next meeting

● Next meeting

The next meeting should be organized by a Foodle, on the 6-7 of September or the following week

There are minutes attached to this event. Show them.
    • 4:00 PM 4:05 PM
      Previous meeting minutes 5m

      See https://indico.cern.ch/event/394829/note/

      No comments on the minutes of the previous meeting

    • 4:05 PM 4:30 PM
      VO log recommendations 25m

      Based on https://edms.cern.ch/document/428037/3 and https://edms.cern.ch/document/793208/1

      • Alice report (only VO to report during the meeting):
        • First document is high level and was already discussed during the meeting: Alice already follows these recommendations
        • Alice keeps all information mentioned in the second document. However, the format used is different and Alice does not use syslog but an internal messaging solution
      • Discussion on log format uniformity:
        • Maarten: If we were to start from scratch, it would be doable. But changing the log format now would be horribly difficult and would cost a lot of efforts
        • Ian: We still should try to avoid duplication of efforts and avoid using different tools and formats
        • CMS can't directly modify its logging format: CMS relies on GlideinWMS, which may be open to suggestion, but not a complete rewrite is impossible
        • Maarten proposed that the VOs could send their data to a central cluster (e.g. maintained by the Security Team) which could process and normalized them.
        • Vincent replied that the CERN Security Team currently can't cope with such data/load
        • Agreement that we might find parts of each VO logging infrastructure that might be enhance, but that we should no try to rebuild a new logging system
      • Self assessment discussion:
        • Vincent asked if it would make sense for VOs to try to self-asses themselves by:
          • Picking a random Pilot Job from the previous week
          • Pick a random job ran during the life time of that pilot job
          • Try to see how expensive the identification of the owner of that job and his activity/payload would be
        • Maarten commented that similar exercise were done in the past, in particular a security challenge ran against panda/ATLAS. Such a real challenge could be very expensive.
        • Ian explained how, within the UK NGI, he was testing the argus deployment by running a job that would query a non existing file on a web server: the webserver logs easily show where the job ran
        • Proposed self-assessment for VOs:
          • Identify what would be the cost and potential bottleneck, given a restricted time period and the IP address of a worker node, of identifying matching jobs and the corresponding user and payload
          • Such an assessment should be taken under the assumption that this WG will succeed in building a complete isolation layer between the pilot and the jobs and should thus ignore pilot-job take-over or hidden daemonized processes scenarios
        • Action: perform the proposed self-assessment
    • 4:30 PM 4:55 PM
      Update on containers 25m
      • Alice reported working on Docker and Kubernetes deployment. While these do rely on containers, it was agreed that the goal was different from the mandate of this WG, as they rely on special deployment on sites, giving root access to VO.
      • CERN and FNAL opened feature requests for unpriviledged namespace support in RedHat Entreprise Linux 7.3, which is supposed to be released later this year.
      • Vincent proposed a pilot: using a more recent kernel/distrubution to try to build a tool that will use unprivileged namespaces to build a working SL6/7 environment for jobs, in order to test it on RHEL 7.2 or any preview of 7.3 and identify missing pieces. This proposal did not arouse the interest of members of the Working Group
      • Dave proposed another pilot: build glExec plugins relying on privileged namespaces (still a SUID) to create the same kind of isolation (which could also, for example, allow to obtain an SL6 environment on SL7). CMS would be interested on such tool, which could ease the deployment of glExec (not special user or mapping needed) for new sites. In theory, as soon as unprivileged namespaces were supported, adpating such plugin to run as a normal user should be rather straightforward
      • Maarten commented that it may be worth it as long as it does not consume too much effort and investment
      • No agreement was reach on this topic, leading to no action before the next meeting
    • 4:55 PM 5:00 PM
      Next meeting 5m

      The next meeting should be organized by a Foodle, on the 6-7 of September or the following week