WLCG Traceability and Isolation WG (Vidyo meeting)

Europe/Zurich
31/S-028 (CERN)

31/S-028

CERN

30
Show room on map

● Previous meeting minutes

No comments on the minutes of the previous meeting


● VO self-assessment feedback

  • Alice report:
    • Logs are stored forever
    • Test:
      • 1 host and a 3 hours period randomly chosen
      • Identified 10 matching jobs and 3 matching users.
      • Also able to identify the source of submission, the executable and the input. These last two, kept under version control, are under user control but have audit logs.
    • Test takes ~2h for recent traces.
      • Data mining difficult, proper dedicated tools could help
    • It should be possible to do an inverted query: identify all jobs from a submission point
  • Atlas report [Was submitted later by email]:

Given an IP address it is possible to query BigPanda to know which payloads have run on the specific WN. At the moment the minimum interval is 1 day. I'm checking with the developer if we can add also a time selection to restrict to few hours, but it is really a refinement so if needed will be with low priority. If there are specific batchIDs it is also possible to query for those. In general most ATLAS pilots don't run for more than one payload unless the site specifies so and even in that case the pilot downloads always the same user payloads or exits.

  • CMS report:
    • CMS didn't have the time/opportunity to run the test
    • Development ongoing to keep all jobs records forever in a Kibana dashboard forever, allowing for easy queries
  • LHCb Report:
    • Logs are kept for weeks
    • Shifters can manually query the information using the web portal, going through jobs one by one. Shifters can then access the payload and identify the pilot, the user and the submission IP. Development would be needed to make it simpler, if required.
    • Experts can run MySQL queries directly
    • All the tools and data are here, mainly to the experts. Only part of it is available to the shifters.
  • Discussions pointed out again that having a central big data solution for processing and querying these logs could simply the query process.

● Update on containers

  • Unfortunately, RedHat has refused to include unprivileged mount namespace support in RedHat 7.3, due to security concerns
  • Singularity:
    • Please see the presentation for details about Singularity
    • Several experiment representatives present expressed their interest in this solution

● Next meeting

  • Due to CHEP/HEPiX, there will be no meeting in october
  • Next meeting, to be organized by a foodle for November
There are minutes attached to this event. Show them.
    • 4:00 PM 4:05 PM
      Previous meeting minutes 5m

      See https://indico.cern.ch/event/544800/note/

      No comments on the minutes of the previous meeting

    • 4:05 PM 4:30 PM
      VO self-assessment feedback 25m

      Feedback on action 20160629-01: Perform the following self-assessment: Identify what would be the cost and potential bottleneck, given a restricted time period and the IP address of a worker node, of identifying matching jobs and the corresponding user and payload. (Such an assessment should be taken under the assumption that this WG will succeed in building a complete isolation layer between the pilot and the jobs and should thus ignore pilot-job take-over or hidden daemonized processes scenarios)

      • Alice report:
        • Logs are stored forever
        • Test:
          • 1 host and a 3 hours period randomly chosen
          • Identified 10 matching jobs and 3 matching users.
          • Also able to identify the source of submission, the executable and the input. These last two, kept under version control, are under user control but have audit logs.
        • Test takes ~2h for recent traces.
          • Data mining difficult, proper dedicated tools could help
        • It should be possible to do an inverted query: identify all jobs from a submission point
      • Atlas report [Was submitted later by email]:

      Given an IP address it is possible to query BigPanda to know which payloads have run on the specific WN. At the moment the minimum interval is 1 day. I'm checking with the developer if we can add also a time selection to restrict to few hours, but it is really a refinement so if needed will be with low priority. If there are specific batchIDs it is also possible to query for those. In general most ATLAS pilots don't run for more than one payload unless the site specifies so and even in that case the pilot downloads always the same user payloads or exits.

      • CMS report:
        • CMS didn't have the time/opportunity to run the test
        • Development ongoing to keep all jobs records in a Kibana dashboard forever, allowing for easy queries
      • LHCb Report:
        • Logs are kept for weeks
        • Shifters can manually query the information using the web portal, going through jobs one by one. Shifters can then access the payload and identify the pilot, the user and the submission IP. Development would be needed to make it simpler, if required.
        • Experts can run MySQL queries directly
        • All the tools and data are here, mainly to the experts. Only part of it is available to the shifters.
      • Discussions pointed out again that having a central big data solution for processing and querying these logs could simply the query process.
    • 4:30 PM 4:55 PM
      Update on containers 25m
      • Unfortunately, RedHat has refused to include unprivileged mount namespace support in RedHat 7.3, due to security concerns
      • Singularity:
        • Please see the presentation for details about Singularity
        • Several experiment representatives present expressed their interest in this solution
    • 4:55 PM 5:00 PM
      Next meeting 5m
      • Due to CHEP/HEPiX, there will be no meeting in october
      • Next meeting, to be organized by a foodle for November