• Added a reserved queue to the HTCondor batch system. It Currently consists of 1 node and would allow a analysis team to run whole node jobs for the MadGraph application. 
  • Working on taming the ceph file system. It has some stability issues that warrant more investigations. Very high mem usage on mds is observed during incidences. 
    • volume mounts are monintored and alerted on both HTCondor workers and the interactive login nodes.
    • will work on rook-ceph upgrade. Had some trouble last time due to some K8s deprications. It appears at least the newer ceph version(v17) would have a alertable metric(slow mds ops) that we usually observed during incidences.
    • will also update os. 
  • AnalysisBase image has been updated to latest: 24.2.37.  All the libraries have been updated. Now setting dev version of uproot with a lot of fixes for reading physlite data files.