WLCG AuthZ WG Call



Attendees: Derek, Hannah, Michel, Vincent, Mischa, Andrea, Romain, Ioannis, Maarten, Bruce, Joel 

Recording: https://videos.cern.ch/deposit/project/aefb5d6eab4747008b54f305a9d721c5


  • SciTokens integration in HTCondor
    • Derek works on behalf of OSG and SciTokens project
    • Also talking about integration with things like box.com
    • Talk on SciTokens in general from recent HTCondor week: https://agenda.hep.wisc.edu/event/1201/other-view?view=standard#2018-05-22  -->
    • New daemon, condor_credd that manages credentials on submit machine, can create and destroy, not look or alter
    • Credential monitor can refresh and manipulate credentials
    • Refresh token implemented 
    • User guided to get correct tokens, e.g. write for one service and read for another
    • No credential monitor daemon on the execute side
    • Cannot directly act as an individual, just bearer token authorisation
    • Tokens removed after job ends (couple of days)
    • Fairly high data output for submit host so moved off
    • Site specific pieces, cred mon must be configured by site admin to specify which things it will support, e.g. box.com, LIGO
    • Flexible credential support
    • Upcoming version 8.8 of HTCondor will have documentation
    • Qs
      • How is the access token refreshed in the execution node? Queries from the execution agent to the submit node every 5 minutes for a new access token. May get the same one if renewal isn't needed. Uses an environment variable to point to directories that have tokens
      • User authentication and authorisation from web UI? URL keyed to user to access web portal to request tokens (and get OAuth tokens for e.g. box.com), land on submit host website and accept things. Quite fine grained token provisioning, depends on service. box.com gives root access token first that can be exchanged for a more limited token
      • Scalability? Had to move to a more powerful machine? Because of the amount of data coming back to submit machine, had to remove job output data from submit host. Access tokens are ~1Kb and messages are equal, not adding a huge amount. Not sent per job but per server. There will be possibility of growing pains with tokens and monitoring.
      • Can the user limit the lifetime of a credential? Every 5 minutes it is asked for a new credential but not necessarily refreshed if not needed.
      • WLCG integration? Just another OAuth2 schema, should fit within the framework   
      • How do you handle traceability? For e.g. box.com we can't see inside the token. For SciTokens the subject attribute gives an indication of who used it, e.g. DN or some kind of name and has a JSON token ID that can be traced up the stack to find a user who issued the token.
      • Python binding support added? Will use normal Condor submit, nothing of credentials ported to the python bindings. Not sure of plans.
  • Traceability & Suspension
    • Requirements doc at https://docs.google.com/document/d/1hnsPWf9C7ODVXZ7JehsSEiEsQwf5UmqLfTwVDhuqHzk/edit 
    • 2 parts important
      • Traceability in access
      • Traceability in execute
    • Who should be able to resolve the user? 
    • If we don't include a transparent ID in the token it is ONLY the VO (in this model) who can resolve the user
    • Sites should be able to block ID but need to ensure consistent mapping between users and persistent opaque IDs
    • Having the ability to suspend a user at the Site allows sites not to ban the whole VO, good to avoid financial biases against banning VO
    • Blocking between Sites is not necessarily in scope, persistent ID could be per site but perhaps that's not possible since you don't necessarily know where the job will land
    • Token exchange between sites is probably quite tricky
    • Discussion in Traceability group about letting pilots run multiple payloads - quite far removed from this scope
    • More discussion needed in Traceability & Isolation, or Containers group
    • Sites need some capability to block tokens locally - request suspension from issuer (VO) or have a local mechanism
    • Sites cannot block on a VO level, but should be able to control who they let in.
    • ***Needs to be a discussion on tooling for site level blocking*** probably not discussed by EGI/OSG. We should describe what it would imply in terms of technology, e.g. pilot job framework
      • Management board will have to decide whether to force a fine grained blocking tool deployment at each site. Cost and risk implications
      • Needs to be finished within the Traceability & Isolation WG
  • July pre-GDB
    • Revisit Requirements Doc, include a specific mention of technical constraints of suspension (i.e. none)
    • @Hannah clean up Requirements Doc (and JWT one) before July pre-GDB
    • @Hannah send more info r.e. pre-GDB to Ioannis & Nicolas
    • Pilot update
      • EGI Checkin Pilot, progressing CoManage plugin, testing and minor changes left. VOMs integration ongoing. 
      • Indigo IAM, unclear who will present
There are minutes attached to this event. Show them.