WLCG Traceability and Isolation WG (Vidyo meeting)

Europe/Zurich
31/S-028 (CERN)

31/S-028

CERN

30
Show room on map

Present: Andrew McNab, Brian Paul Bockelman, Dave Dykstra (left at 1700 CET), Ian Neilson (left at 1730 CET), Maarten Litmaath, Miguel Martinez Pedreira (arrived during Singularity discussion), Vincent Brillault


● Welcome and minutes from last meeting

Dave note that there is no need for credentials for one of the two models discussed briefly. The notes have been amended to reflect this.


● Singularity update

  • OSG has made significant progress in testing/integrating/using Singularity
    • 15 sites, 1M jobs this week, 40-60% of the pool
    • OSG sites seem to have no problem with SUID: sites trust OSG
    • ~200 lines of script needed to setup environment properly
    • Isolation as expected: pilot credentials, environment and logs protected
  • CMS integration thought to be easy: same tools
    • As of April 1st, sites might expose RHEL7 environment to the pilot if and only if they also provide singularity (very few to no job otherwise)
    • GLExec still expected if RHEL6 environment exposed (and no singularity)
  • Container model for OSG: pull docker 'images' (as flat files) into CVMFS
    • Some validation made by OSG team before merging, but basically under responsibility of the user who asked for it
    • Not a requirement from CMS (two basic images needed: RHEL 6 & RHEL7) but for OSG (esp. users coming from a docker environment)
  • It's possible to run singularity within a docker container (but not default configuration):
    • Docker isolate pilots from themselves and from the site
    • Singularity isolate user payload from themselves and from the pilot
  • Security review:
    • Brian (OSG) still looking for effort through CTSC: they are still busy with reviewing HTCondorCE (asked by OSG few months ago, before singularity appeared). In the worst case, effort should be available after that review (end of summer/early autumn)
    • Maarten:
      • No success with the team that was in Barcelona and made review for EGI (leader now in CTSC)
      • Another trial with a team in Poland: not agreed but not completed turned off either
      • University from the WhiteHat program at CERN: nothing yet
  • Access to small singularity test cluster at CERN: still waiting for Ben to broadcast access to all VOs (currently used by CMS only)

● VO Data workflows

● ALICE data workflow

Discussion/comments on the data model/workflow (see slides for details):

  • Except for the HLT farm, the x509 credentials of the proxy sill accessible to the user (could be isolated using singularity)
  • No user proxy/credential for the job: job only has a job token used to get data access token from the central service
  • Custom protocol on the storage side:
    • ALICE-specific protocol, no standard, but code is public
    • Additional configuration required for sites (XROOTD plugin)
  • Two models possible (can be combined):
    • Jobs get all data access at start-up, with an extended expiration date
    • Jobs continuously ask central service for tokens, with shorter expiration date
  • File deletion might be blocked (not required by standard jobs, not clear if implemented during the meeting)

● CMS Data workflow

  • Discussion/comment on current situation in OSG (see slides for details):
    • Users negotiate directly with sites
    • No restriction required for user's folder read access, but sites by have more implemented
    • No concept of group in CMS: all data owned by single user, quota defined at the user level by the home institude
  • Discussion/comment on Brian's ideas (macaroon-based token):
    • Could converge on the long term with Alice model & implementation
    • Indigo-DC is doing similar work: Macaroon currently rejected or postponed for them, but we should keep in touch
    • Andrew: X509 proxies (i.e. with delegation/restrictions) can be used for the same purpose

● LHCb Data workflow

Comments/questions on the data workflow (see slides for details):

  • Users can in theory use their proxy to directly talk to the back-end and bypass restriction (but don't know how to do it)
  • Depending on site configuration, there might be in fact already proper ACLs (site can know the real owner, as they have the user certificate and can map it)
  • The complete isolation currently available in LHCb's VMs could be obtained in the normal grid using singularity

● Atlas Data workflow

Unfortunately, nobody from ATLAS was able to join nor to provide slides

● Discussion

  • There seems to be two models to avoid giving full pilot/user token to job and services (storage):
    • Job obtains all data access at start-up:
      • No global credential given to job
      • Requires to predict all possible fail-over schenario
    • Job obtains token that can be delegated further
      • Already delegated from the user or the pilot, with restriction
      • Can be delegated/restricted further by the job before given to services
  • Agreement within the working group that we should concentrate on existing and maintained solution like Macaaron, x509 proxies, SAML assertions, ... and collaborate with other efforts (e.g. Indigo-DC)

● Action review, AOB and next meeting

  • No new decided during the meeting
  • 20170201-01: closed for Alice/LHCb/CMS, still open for ATLAS
  • No date agreed upon, a Foodle will be open for a meeting between March 27 and April 21
There are minutes attached to this event. Show them.
    • 16:00 16:05
      Welcome and minutes from last meeting 5m

      See https://indico.cern.ch/event/604836/note/

      Dave note that there is no need for credentials for one of the two models discussed briefly. The notes have been amended to reflect this.

    • 16:05 16:20
      Singularity update 15m
      Speakers: Brian Paul Bockelman (University of Nebraska-Lincoln (US)), Vincent Brillault (CERN)
      • OSG has made significant progress in testing/integrating/using Singularity
        • 15 sites, 1M jobs this week, 40-60% of the pool
        • OSG sites seem to have no problem with SUID: sites trust OSG
        • ~200 lines of script needed to setup environment properly
        • Isolation as expected: pilot credentials, environment and logs protected
      • CMS integration thought to be easy: same tools
        • As of April 1st, sites might expose RHEL7 environment to the pilot if and only if they also provide singularity (very few to no job otherwise)
        • GLExec still expected if RHEL6 environment exposed (and no singularity)
      • Container model for OSG: pull docker 'images' (as flat files) into CVMFS
        • Some validation made by OSG team before merging, but basically under responsibility of the user who asked for it
        • Not a requirement from CMS (two basic images needed: RHEL 6 & RHEL7) but for OSG (esp. users coming from a docker environment)
      • It's possible to run singularity within a docker container (but not default configuration):
        • Docker isolate pilots from themselves and from the site
        • Singularity isolate user payload from themselves and from the pilot
      • Security review:
        • Brian (OSG) still looking for effort through CTSC: they are still busy with reviewing HTCondorCE (asked by OSG few months ago, before singularity appeared). In the worst case, effort should be available after that review (end of summer/early autumn)
        • Maarten:
          • No success with the team that was in Barcelona and made review for EGI (leader now in CTSC)
          • Another trial with a team in Poland: not agreed but not completed turned off either
          • University from the WhiteHat program at CERN: nothing yet
      • Access to small singularity test cluster at CERN: still waiting for Ben to broadcast access to all VOs (currently used by CMS only)
    • 16:20 17:20
      VO Data workflows 1h
      • ALICE data workflow 10m
        Speaker: Miguel Martinez Pedreira (Johann-Wolfgang-Goethe Univ. (DE))

        Discussion/comments on the data model/workflow (see slides for details):

        • Except for the HLT farm, the x509 credentials of the proxy sill accessible to the user (could be isolated using singularity)
        • No user proxy/credential for the job: job only has a job token used to get data access token from the central service
        • Custom protocol on the storage side:
          • ALICE-specific protocol, no standard, but code is public
          • Additional configuration required for sites (XROOTD plugin)
        • Two models possible (can be combined):
          • Jobs get all data access at start-up, with an extended expiration date
          • Jobs continuously ask central service for tokens, with shorter expiration date
        • File deletion might be blocked (not required by standard jobs, not clear if implemented during the meeting)
      • CMS Data workflow 10m
        Speaker: Brian Paul Bockelman (University of Nebraska-Lincoln (US))
        • Discussion/comment on current situation in OSG (see slides for details):
          • Users negotiate directly with sites
          • No restriction required for user's folder read access, but sites by have more implemented
          • No concept of group in CMS: all data owned by single user, quota defined at the user level by the home institude
        • Discussion/comment on Brian's ideas (macaroon-based token):
          • Could converge on the long term with Alice model & implementation
          • Indigo-DC is doing similar work: Macaroon currently rejected or postponed for them, but we should keep in touch
          • Andrew: X509 proxies (i.e. with delegation/restrictions) can be used for the same purpose
      • LHCb Data workflow 10m
        Speaker: Andrew McNab (University of Manchester)

        Comments/questions on the data workflow (see slides for details):

        • Users can in theory use their proxy to directly talk to the back-end and bypass restriction (but don't know how to do it)
        • Depending on site configuration, there might be in fact already proper ACLs (site can know the real owner, as they have the user certificate and can map it)
        • The complete isolation currently available in LHCb's VMs could be obtained in the normal grid using singularity
      • Atlas Data workflow 10m
        Speaker: Alessandra Forti (University of Manchester (GB))

        Unfortunately, nobody from ATLAS was able to join nor to provide slides

      • Discussion 20m
        • There seems to be two models to avoid giving full pilot/user token to job and services (storage):
          • Job obtains all data access at start-up:
            • No global credential given to job
            • Requires to predict all possible fail-over schenario
          • Job obtains token that can be delegated further
            • Already delegated from the user or the pilot, with restriction
            • Can be delegated/restricted further by the job before given to services
        • Agreement within the working group that we should concentrate on existing and maintained solution like Macaaron, x509 proxies, SAML assertions, ... and collaborate with other efforts (e.g. Indigo-DC)
    • 17:20 17:30
      Action review, AOB and next meeting 10m
      • No new decided during the meeting
      • 20170201-01: closed for Alice/LHCb/CMS, still open for ATLAS
      • No date agreed upon, a Foodle will be open for a meeting between March 27 and April 21