OSG 3.5.8 + 3.4.42
Next week at the earliest
Other
Working with WLCG IAM folks to request WLCG tokens for testing HTCondor-CE and job submission via Harvester
Tickets:
- currently none
- since last meeting: 4 tickets, all closed/solved now, caused by short network issues at UM
Operation:
- switch problems at UM
after adding a new switch to upgrade T3-T2 link from 40Gpgs to 100Gpbs
caused some spanning tree and management interface access issues
current status: seems solved after second firwmare update
- currently evolving issue with 1 of 2 Liebert units at UM.
Currently operating at reduced but sufficient capacity.
Repair will need either 8h partial downtime or wait for 3rd planned unit to become online.
- as usual misc memory and disk issues for hardware under warranty or self-supported
New hardware:
- last of the new R740XD2 dcache server almost online
still fighting with MSU IT automated SSL certificate issuance to get an IGTF-signed cert.
- will allow to retire the 4x oldest MSU dcache servers
and free up one MD3260 for spares on self-supported medium-old storage
Apologies if nobody is present. Judith is OOTO and I am working on network equipment to bring up new purchases.
-David
GGUS Ticket #144542: Stage-in issues. Getting little from CERN regarding help debugging. Last update Judith removed the secondary lsm mover from the production queue, which was requested.
UC:
IU
UIUC
Relatively smooth operations over the break, although some of our C6100 workers are starting to die for various hardware reasons. We'll be investigating.
Low level DDM issue resolved my migrating away from "Let's Encrypt" host certs for NET2 and NESE gridftp endpoints.
NESE DDM started over the break with containers running on NESE gateways. (NESE_DATADISK). Performance looks good so far. Adding gridftp endpoints and operations infrastructure.
NET2 storage for DELL has arrived. On of the r740xd2's will be grabbed for a SLATE node. We'll be in touch with the SLATE team as soon as that's up and running. Need to expand UPS to three new racks for this. Management switches still haven't yet arrived, but everything else is at Holyoke.
Still need to make a plan for ipv6. We see that about 50% of DDM sites have ipv6 addresses now.
in the last 30 days at NERSC produced 25.5 Million events. 8.5 M NERSC hours
Very bursty usage. About once per week get up to > 2k nodes (almost 300K cores) for a short time. Currently running running with a modified pilot (will want to switch over to the new pilot in next allocation cycle).
The 2019 ERCAP allocation ends Jan 14, 2020 7:00 PST. We will have some hours left over. Cori downtime will be Jan 14, 07:00 PST to Jan 15, 2020 07:00 PST. After this downtime python 2 will not be supported.
20-Dec-2019, Lincoln, Marc and DB worked together at Univ of Chicago to produce a Docker container to run Harvester on the edge. This will be useful for the OLCF-Slate instance.