See https://twiki.cern.ch/twiki/bin/view/LCG/WLCGContainers for working group page and actions. Agreed baseline doc is here.

WLCG Containers Working Group

Europe/Zurich
513/1-024 (CERN)

513/1-024

CERN

50
Show room on map

# WLCG Containers Working Group -- 2019-01-30

Present: Alessandra F, Andrej F, Ben J, Brian B, Dave D, David C, Emmanouil V, Maarten L, Sébastien (IN2P3), Jakob B, Lukas H, Simone M, Vincent B, Xin Z

## Review of minutes from previous meeting

No comments on the minutes, minutes approved

## Singularity update:

- Singularity 3.0 bug: pulling from DockerHub doesn't work in unprivileged mode or without loop devices
  - Bug open upstream: https://github.com/sylabs/singularity/issues/2588
  - Initially proposed solution (adding a `-s` flag) is not enough: using unprivileged mode or no loop devices should be completely transparent
  - Dave thinks that a transparent solution could be implemented, to be seen...
  - Without a patch, ATLAS cannot use 3.0.3
  - Lukas pointed out that the bug seems to have been introduced because Singularity is forcing a conversion to their own format, which is surprising considering the intent to become OCI compliant
  - Agreement: while we will need to move to 3.0 at some point, we cannot rush the transition. As this release is still considered buggy, until all blocking bugs are fixed, the recommendation is to keep 2.6.x.

## Unprivileged namespaces VS Singularity SUID

- Security tradeoff, two different models and attack surface:
  - Kernel Unprivileged features increase significantly the attack surface of the kernel, raised few vulnerability in the recent past, but is becoming standard, is under heavy scrutiny and is supported upstream.
  - Singularity presents a smaller, more controllable attack surface. However, recent incidents have shown a non negligible number of critical vulnerabilities, usually relatively basic...
- Results of security review expected in June 2019:
  - Unclear if the comparison of two modes is included, but it would be appreciated
- Recommendation to sites:
  - We have to be careful on what we recommend to sites: Worries were expressed about a strong recommendation before the end of the security review
  - Still agreed that sites which were forced to install SUID should be presented with the opportunity to use unprivileged namespaces, after tests have proven that it works
- Testing unprivileged namespaces:
  - Fermilab & Nebraska blocked by a kernel bug
  - CERN offered to enable unprivileged namespaces on the part of Batch recently migrated to CC7 (running 7.6)
  - Vincent pointed out the plan agreed for GlideinWMS will not test unprivileged namespaces if SUID is still supported locally, unless CMS specifies a special SINGULARITY_BIN pointing to a version without SUID  
  - Current test results:
    - ATLAS is ok with 2.6.1 in non SUID mode (see https://docs.google.com/spreadsheets/d/1SGKyja47Veu_8IUXlXWOOEferuFoD62O4m64pgTNgSk/edit)
    - SKA has some issues with SingularityHub in unprivileged mode, being mitigated by a migration to DockerHub (which doesn't work in 3.0 ...)
- Unprivileged namespace still considered as the "end goal/game":
  - Stop doing something special, only use standards. Sites would not require anything but standard linux (with unprivileged namespaces)
  - ATLAS is already planning to support other unprivileged container engine, for example podman
- Actions:
  - CERN is to enable unprivileged namespaces (basically as of now) on batch where possible
  - CERN intends to completely switch off SUID mode on part of the batch capacity (running Centos 7.6)
    - Testing path still unclear. Experts encouraged to get in touch with Ben
    - An announcement will be made before switching it off
- lxplus@CERN not discussed here:
  - Most of the capacity is still SLC6, too early to discuss it                    
  - Expected to be in line with batch @ CERN

## Unpacked.cern.ch

- Scalability:
  - At the moment, only limited to the scalability of the publisher node
  - More tests suggested, but no clear suggestion on what to digest nor the expected rate
- Long term maintainability:        
  - Suggestion of an automated life-cycle for user containers, to be removed after a certain time:
    - Currently, only one type of container implemented, no plan to implement something else unless proven necessary
  - OSG always sync images associated with a tag
    - When using "latest", previous versions are automatically garbage-collected after update
    - Unpacked.cern.ch does the same
  - Agreement that debugging should first be done locally
    - Before code is stable, users encouraged to tag specific version to avoid pushing too many images to unpacked.cern.ch, only using "latest" when stable
  - Data preservation not considered an issue here: Containers should be preserved in another registry. Containers can thus be deleted and re-imported as needed
- Big usability issue: "When can I submit my job"/How to decrease latency
  - OSG is recording the first CVMFS version that contains a given container version
    - Pilot jobs are using this knowledge, combined with the container version required by a user job, to match jobs with WNs
- Could the unpacked layers be used as a singularity/docker/... cache directory for building up containers?
  - Unclear, might require being able to configure both read only and read-write caches

## Action review:

(Unfortunately, Dave had to drop out before this)

- WC6 can now be closed now (supported by EPEL 2.6 & upstream 3.0, as far as Vincent understood)
- WC7 can also be closed, see https://docs.google.com/spreadsheets/d/1SGKyja47Veu_8IUXlXWOOEferuFoD62O4m64pgTNgSk/edit
- WC8/WC9: no input (Gavin & Olga not present, Ben doesn't know)
- New action: CERN to enable unprivileged namespaces and disable SUID after tests

There are minutes attached to this event. Show them.