Zoom information
Meeting ID: 996 1094 4232
Meeting password: 125
Invite link: https://uchicago.zoom.us/j/99610944232?pwd=ZG1BMG1FcUtvR2c2UnRRU3l3bkRhQT09
writing to Tape from ATLAS has been paused so we can reduce the number of tapes used for writing. We have reduced the number of tape drives used for writing to 4 drives (each capable of writing 200 MB/s) we can write 67 TB a day to tape and we have almost 400 TB to write from internal HPSS disk cache to tape. Can not reduce the number of drives any more
We having a problem staging files from Tape to dCache disks. dCache is not pulling files from the HPSS Cache and we are seeing a lot of bad dCache restore requests. We are investigating and actively trying to clear up the situation so that data can flow.
This problem was triggered by a large request on Saturday mid-day >200k files. Exact source of the request is under investigation
Updates on US Tier-2 centers
- last of 4 waves done 11-Aug-2021, mostly MSU T3, thus finishing equipment move from dept bldg to data center.
- purchase planning: 3x ESXi hosts + 1x NVMe storage + ~3 dcache storage nodes + as many 1U WNs as budget allows.
- FYI@MWT2: we may be decommissioning the EX9208 this week, to be confirmed.
- draining gate03 for condor-ce update, (we lost 1000 jobs when we did update on gate01 with running jobs, to be cautious, we drain the gatekeeper first)
- patched a bunch of nodes with ipv6 issues (adding ipv6 neigh rules manually).
- did 2 condor update (8.4.13->8.4.14->8.4.15)
- rebuilt all Tier2 WN with CentOS7
- finished rebooting all nodes to the new 1160.36 kernel
MGHPCC scheduled maintenance.
NESE_DATADISK was down for an additional day for Harvard re-networking.
xrootd is working now, passing smoke tests, HTTP-TPC.
High priority items:
1) Prepare for worker node purchase
2) Expand xrd cluster, switch over from gridftp to xrootd
3) OSG 3.5 update
4) ipv6 finish
- Nothing to report, running well.
- Upgraded xrootd proxy (se1) to 5.3.1, seems to run well.
- Preparing to install the compute nodes + storage from our last purchase. Logistically this will allow us to move forward with the move / retirement of UTA_SWT2.
- About to install the latest version of XrootD on the HTTP-TPC test instance. Need to verify the ROCKS recipe for building the host as a final step prior to production deployment.
- Need to schedule a downtime to install the LAN networking upgrade hardware. Many needed software updates will occur during this outage.
- Recent operations generally stable, smooth.