WLCG Traceability and Isolation WG (Vidyo meeting)

Name: WLCG Traceability and Isolation WG (Vidyo meeting)
Start: 2016-09-14T16:00:00+02:00
End: 2016-09-14T17:00:00+02:00
Location: CERN

Wednesday 14 Sept 2016, 16:00 → 17:00 Europe/Zurich

31/S-028 (CERN)

31/S-028

CERN

Show room on map

Hide

● Previous meeting minutes

No comments on the minutes of the previous meeting

● VO self-assessment feedback

Alice report:
- Logs are stored forever
- Test:
  - 1 host and a 3 hours period randomly chosen
  - Identified 10 matching jobs and 3 matching users.
  - Also able to identify the source of submission, the executable and the input. These last two, kept under version control, are under user control but have audit logs.
- Test takes ~2h for recent traces.
  - Data mining difficult, proper dedicated tools could help
- It should be possible to do an inverted query: identify all jobs from a submission point
Atlas report [Was submitted later by email]:

Given an IP address it is possible to query BigPanda to know which payloads have run on the specific WN. At the moment the minimum interval is 1 day. I'm checking with the developer if we can add also a time selection to restrict to few hours, but it is really a refinement so if needed will be with low priority. If there are specific batchIDs it is also possible to query for those. In general most ATLAS pilots don't run for more than one payload unless the site specifies so and even in that case the pilot downloads always the same user payloads or exits.

CMS report:
- CMS didn't have the time/opportunity to run the test
- Development ongoing to keep all jobs records forever in a Kibana dashboard forever, allowing for easy queries
LHCb Report:
- Logs are kept for weeks
- Shifters can manually query the information using the web portal, going through jobs one by one. Shifters can then access the payload and identify the pilot, the user and the submission IP. Development would be needed to make it simpler, if required.
- Experts can run MySQL queries directly
- All the tools and data are here, mainly to the experts. Only part of it is available to the shifters.
Discussions pointed out again that having a central big data solution for processing and querying these logs could simply the query process.

● Update on containers

Unfortunately, RedHat has refused to include unprivileged mount namespace support in RedHat 7.3, due to security concerns
Singularity:
- Please see the presentation for details about Singularity
- Several experiment representatives present expressed their interest in this solution

● Next meeting

Due to CHEP/HEPiX, there will be no meeting in october
Next meeting, to be organized by a foodle for November

There are minutes attached to this event. Show them.

- 16:00 → 16:05
  
  Previous meeting minutes 5m
  
  See https://indico.cern.ch/event/544800/note/
  
  No comments on the minutes of the previous meeting
- 16:05 → 16:30
  VO self-assessment feedback 25m
  
  Feedback on action 20160629-01: Perform the following self-assessment: Identify what would be the cost and potential bottleneck, given a restricted time period and the IP address of a worker node, of identifying matching jobs and the corresponding user and payload. (Such an assessment should be taken under the assumption that this WG will succeed in building a complete isolation layer between the pilot and the jobs and should thus ignore pilot-job take-over or hidden daemonized processes scenarios)
  Alice report:
  
  Logs are stored forever
  
  Test:
  
  1 host and a 3 hours period randomly chosen
  
  Identified 10 matching jobs and 3 matching users.
  
  Also able to identify the source of submission, the executable and the input. These last two, kept under version control, are under user control but have audit logs.
  
  Test takes ~2h for recent traces.
  
  Data mining difficult, proper dedicated tools could help
  
  It should be possible to do an inverted query: identify all jobs from a submission point
  
  Atlas report [Was submitted later by email]:
  
  Given an IP address it is possible to query BigPanda to know which payloads have run on the specific WN. At the moment the minimum interval is 1 day. I'm checking with the developer if we can add also a time selection to restrict to few hours, but it is really a refinement so if needed will be with low priority. If there are specific batchIDs it is also possible to query for those. In general most ATLAS pilots don't run for more than one payload unless the site specifies so and even in that case the pilot downloads always the same user payloads or exits.
  
  CMS report:
  
  CMS didn't have the time/opportunity to run the test
  
  Development ongoing to keep all jobs records in a Kibana dashboard forever, allowing for easy queries
  
  LHCb Report:
  
  Logs are kept for weeks
  
  Shifters can manually query the information using the web portal, going through jobs one by one. Shifters can then access the payload and identify the pilot, the user and the submission IP. Development would be needed to make it simpler, if required.
  
  Experts can run MySQL queries directly
  
  All the tools and data are here, mainly to the experts. Only part of it is available to the shifters.
  
  Discussions pointed out again that having a central big data solution for processing and querying these logs could simply the query process.
- 16:30 → 16:55
  Update on containers 25m
  Unfortunately, RedHat has refused to include unprivileged mount namespace support in RedHat 7.3, due to security concerns
  
  Singularity:
  
  Please see the presentation for details about Singularity
  
  Several experiment representatives present expressed their interest in this solution
  - Singularity 20m
    
    http://singularity.lbl.gov
    
    Speaker: Brian Paul Bockelman (University of Nebraska-Lincoln (US))
    
    singularity-wlcg-isolation-tf.pdf
- 16:55 → 17:00
  Next meeting 5m
  Due to CHEP/HEPiX, there will be no meeting in october
  
  Next meeting, to be organized by a foodle for November

Choose timezone