Meeting ID: 996 1094 4232
Meeting password: 125
Add SRM tape vs disk service types in Topology: https://opensciencegrid.atlassian.net/browse/SOFTWARE-4732
Updates on US Tier-2 centers
1) update to the new security update CentOS7 kernel 1160.36,
did firmware update to all nodes and rebooted them to the new kernel,
Also in the process of rebuilding all remaining SL7 WNs to CentOS7, for uniformity.
2) did the 2 condor security updates: 8.8.13->8.8.14->8.8.15,
update gatekeepers' condor-ce to 4.5.24)
3) since reboot needed for new kernel, also updated dcache from 6.2.23 to 6.2.25,
smooth update (also FW/BIOS)
5) MSU site is moving the last batch of WNs to the Data Center today.
All nodes moved, powered, currently getting connected. Will be done by end of day.
6) still have IPV6 issues at UM. We see them happening on the data switches too.
7) had one instance of job draining due to pending transfer jobs >4000
8) Adjusted space tokens in dCache, result is increase of AGLT2DATADISK by 290 TB
0 GGUS tickets
1 HC bump Yesterday, stage-in timeouts
MGHPCC annual maintenance down day coming up : August 9
Xrootd containers working at BU, HTTP-TPC tests working, with local adler callout, some deletion errors, probably will disappear when we expand. Next steps:
Expand atlas-xrootd.bu.edu to all current gridftp endpoints
Do likewise for NESE storage endpoints (NESE_DATADISK, NESE_SCRATCHDISK)
Do likewise for NESE Tape endpoints
Need to update OIM with Mark
16 DTN endpoints arrived at NESE, racked and cabled.
ipv6 set up on perfsonar nodes.... We still need to test, then expand.
Preparing for major worker node purchase ASAP
Start planning for NESE Ceph storage purchase in the Fall
UMass joining NET2, new person to help with day-to-day operations; expand into UMass space at MGHPCC; collaborate with BU, Harvard, UMass types on large shared pool of worker nodes roughly along the lines of the shared storage NESE project.
- Running well
- Had CVMFS problems with one worker node which caused strange rucio errors. Reboot fixed that.
- Now using HTTP-TPC in production, seems to run well.
- Two site issues:
(i) squid outage on 7/24
(ii) site drained on 7/30
- Equipment from most recent purchase arriving. Will need to schedule a downtime for the installation of the LAN re-vamp.
- Ongoing work / testing: