WLCG Traceability and Isolation WG (Vidyo meeting)

Europe/Zurich
31/S-028 (CERN)

31/S-028

CERN

30
Show room on map

Present: Brian Paul Bockelman, Dave Dykstra, David Crooks, Miguel Martinez Pedreira, Mischa Salle, Vincent Brillault


● Welcome and minutes from last meeting

  • No comment on the minutes of the previous meeting
  • Vincent reported that at the last GDB, a new WG was created, the "container" working group, which will take care of the uniformed deployment an isolation solution (e.g. singularity). As a result, this working group will now only focus on traceability

● Compute traceability: artefacts & challenges

Two axis to work on (potentially in parallel):

  • Runtime traceability: finding more details (e.g user identifier) about a running job from artefact left by the pilot/VO framework (file name, log files, etc)
    • Needed for a better challenge capable of validating the VO findings
    • Actions on VOs: identify what is already available and can be collected
  • Offer a possibility for sites to collect these logs (like FNAL for CMS):
    • HTCondorCE has a new feature, which can produce audit logs if the jobs push backs the right information.
      • Dave will share a command line that can be used to produce that data
      • This does not cover all CEs and might only be a gap solution
    • What should be required from VO?
      • First estimation: Start/stop action, user unique identifier (opaque string)
      • More debug information from VOs? (e.g. pilot id, job id for the pilot framework, etc)
      • Any format needed?
    • How to collect it from other CEs, for other VOs?
      • Suggestion welcome!
    • Discussions to be started by email
       

● Storage traceability

  • Brian presented a pilot using tokens to authenticate and authorize for storage (XROOTD) actions
    • This is based on standard technology and existing work (e.g. auth0)
    • As discussed at previous meetings, this approach is compatible with Alice current system, so a convergence could be possible on the long term
    • As nobody from ATLAS or LHCB was present, there was no feedback from them.
    • This will be presented at the next GDB, at which time it will be possible to collect interest from all participants
  • Brian suggested to run a storage challenge
    • This would be technically more difficult than the simple compute challenge currently discussed
    • To be discussed over mail as what can be tested and how

● Actions & next meeting

  • AOB:
    • Dave noted that the name of the WG was now not in line with the mailing list and the website: Vincent will see what can be done.
  • Actions:
    • VOs: identify and report artifacts that can be used for identifying jobs/users
    • Dave: share commands that can be run to populate the HTCondorCE logs from a running job
    • Vincent start mail discussion on:
      • How pilots can push user/job information to sites? What should be pushed?
      • Ideas on how to do a Storage Challenge
  • Given the lack of progress and activity, no meeting is schedule yet. When the discussion will have progressed enough by email, another meeting should be schedule via foodle
There are minutes attached to this event. Show them.
    • 1
      Welcome and minutes from last meeting

      See https://indico.cern.ch/event/634743/note/

      Speaker: Vincent Brillault (CERN)
      • No comment on the minutes of the previous meeting
      • Vincent reported that at the last GDB, a new WG was created, the "container" working group, which will take care of the uniformed deployment an isolation solution (e.g. singularity). As a result, this working group will now only focus on traceability
    • 2
      Compute traceability: artefacts & challenges

      Two axis to work on (potentially in parallel):

      • Runtime traceability: finding more details (e.g user identifier) about a running job from artefact left by the pilot/VO framework (file name, log files, etc)
        • Needed for a better challenge capable of validating the VO findings
        • Actions on VOs: identify what is already available and can be collected
      • Offer a possibility for sites to collect these logs (like FNAL for CMS):
        • HTCondorCE has a new feature, which can produce audit logs if the jobs push backs the right information.
          • Dave will share a command line that can be used to produce that data
          • This does not cover all CEs and might only be a gap solution
        • What should be required from VO?
          • First estimation: Start/stop action, user unique identifier (opaque string)
          • More debug information from VOs? (e.g. pilot id, job id for the pilot framework, etc)
          • Any format needed?
        • How to collect it from other CEs, for other VOs?
          • Suggestion welcome!
        • Discussions to be started by email
           
    • 3
      Storage traceability
      • Brian presented a pilot using tokens to authenticate and authorize for storage (XROOTD) actions
        • This is based on standard technology and existing work (e.g. auth0)
        • As discussed at previous meetings, this approach is compatible with Alice current system, so a convergence could be possible on the long term
        • As nobody from ATLAS or LHCB was present, there was no feedback from them.
        • This will be presented at the next GDB, at which time it will be possible to collect interest from all participants
      • Brian suggested to run a storage challenge
        • This would be technically more difficult than the simple compute challenge currently discussed
        • To be discussed over mail as what can be tested and how
    • 4
      Actions & next meeting
      Speaker: Vincent Brillault (CERN)
      • AOB:
        • Dave noted that the name of the WG was now not in line with the mailing list and the website: Vincent will see what can be done.
      • Actions:
        • VOs: identify and report artifacts that can be used for identifying jobs/users
        • Dave: share commands that can be run to populate the HTCondorCE logs from a running job
        • Vincent start mail discussion on:
          • How pilots can push user/job information to sites? What should be pushed?
          • Ideas on how to do a Storage Challenge
      • Given the lack of progress and activity, no meeting is schedule yet. When the discussion will have progressed enough by email, another meeting should be schedule via foodle