- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
OSG network services are migrating to AGLT2_MSU's VMware instance.
Today was a HEPiX Network Function Virtualization working group met today: https://indico.cern.ch/event/715631/
Technical details discussed at the regular XCache Monday meeting.
Wei and Xin got xcache robot cert copies. Wei and Shawn will try Lincoln's instructions to set up their own k8s clusters as soon as they get hardware in place. Ilija will test how things get deployed at Utah cluster. Helm deployment development will start with multinode xCache cluster (so not yet). Will have to discuss things with Andy how to organize cluster and it's storage.
At TCB meeting presented effects of pCache. ARC site claims ~80% cache hit rate from 250TB LRU cache!
Developing code to simulate different caching configurations.
NERSC using Harvester + Minipilot. Working smoothly. Allocation completely exhausted this past week. Containers being used for software distribution.
ALCF using Harvester + Yoda. Lots of debug work ongoing related to PanDA settings, JumboJobs, Athena performance improvements, etc. Aiming for mid-May to have Yoda tested, validated and ready for production jobs. Singularity containers now being used for software distribution.
OLCF testing Harvester + minipilot, but not yet in production. Aiming for mid-May for Harvester online. No ETA for containers in production, but no obvious hang ups to deploying them at this point.
Our OSG 3.4 / LCMAPS / HTCONDOR upgrade is done.
Our HTCONDOR problems have not appeared at Harvard or BU since the upgrade... so far. Fingers crossed.
We're ready to migrate away from GRAM. Coordinating with Jose and John Hover.
Installed Wei's version of Gridftp with callout to our Adler32 code. Works fine.
Next up is migrating away from Bestman.
LHCONE peering resumed successfully after replacing a bad card at MANLAN.
NET3 "Northeast Tier 3" slowly growing. UMASS/Amherst buy-in ordered.
We strangly had a very high rate of deletion for 2-3 days, causing SRM stress and a couple of trouble tickets. We made some adjustments and then the deletion rate also mysteriously went down by a factor of 5 or so.
There is lots of NESE activity. 10 PB raw deployment ordered.
SL7 migration coming soon.
- Currently in OSCER scheduled maintenance till this evening
- Lucille cooling failure, running with reduced capacity
- Had OU network problem last week, fixed
- Experienced 50% DDM transfer failures earlier this week, which was tracked down to checksum timeouts. Extended timeout for that in our gridftp server, which fixed failures. Wei knows details.