- UIUC PM on 7/17
- IU UPS replacement on 7/16
- Firewall hardening on IU nodes
- Moved 200TB from SCRATCHDISK to DATADISK per GGUS ticket #1000096
- Updating Elasticsearch cluster to 9.0 today (7/23)
- Pilot use of cgroups to enforce memory limits:
- Paul Nilsson has code working that allows cgroups memory limits to kill the payload without killing the pilot.
- This code only works with very recent versions of condor: 24.0.7 or higher.
- He has been testing release candidates using something akin to the Hammer Cloud to send about 8 jobs to MWT2 using new pilot versions.
- These tests are successful in the sense that the pilot runs without errors but the test jobs are well behaved and don't exceed the memory limits.
- I (Fred Luehring) have asked Paul to test that the killing the payload without killing the pilot functionality works.
- Last night I suggested using the derivation jobs that are badly leaking memory to test the code at MWT2.
- Previously I have suggested switching MWT2 over to the new pilot to make a large scale test but that was before we had a perfect request to test the cgroups memory limits.
- Doing tests at MWT2 requires sign off from the whole MWT2 team.
- The JIRA ticket tracking the cgroups work: https://its.cern.ch/jira/browse/ATLASPANDA-1251