- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
Zoom information
Meeting ID: 996 1094 4232
Meeting password: 125
Invite link: https://uchicago.zoom.us/j/99610944232?pwd=ZG1BMG1FcUtvR2c2UnRRU3l3bkRhQT09
OSG 3.6 targeted for end of February: https://opensciencegrid.atlassian.net/browse/SOFTWARE-4282 . Highlights include:
Eric (Xin had to leave unexpectively)
- Smooth running during the break
- dCache got full and site was black listed for a couple of days around 12/31. Prompt reaction from ADC, deleting data. Full incident analysis on going
- FTS transfer nodes got disturbed by security scan ( GUUS:150057) . Bug reported to dCache,
Updates on US Tier-2 centers
no ggus tickets
smooth operation during the holidays, job draining between 1st and 3rd Jan, it started to ramp up at midnight of 3rd, other than that, condor cluster usage remained high (>96%)
about 15 work nodes lacked a squashfs rpm, which led to boinc jobs failure, reinstalled the rpm and re-enabled boinc on the nodes.
only one notable hardware incident. One R740XD2 crashed with complex multiple symptoms. PCIe error from NIC, memory bit error rate, iDRAC not reachable over http. Solution involved swapping 2 DIMMs (maybe not the root cause nor root solution) and updating all FW, especially BIOS and NIC. Dell is recommending updating BIOS (R740xD2 to >=2.8.2)
Continuing to gain experience with the UTA K8S test cluster, to understand better the environment, and looking into the setup of interfacing it with the ATLAS workload management system.
Meanwhile increased the cluster size to total 224 cores of workers, and upgraded it to the K8S latest version 1.20.1 .
Next step try to setup and run with ATLAS jobs.