US ATLAS Computing Integration and Operations
-
-
13:00
→
13:05
Top of the Meeting 5mSpeakers: Eric Christian Lancon (BNL), Robert William Gardner Jr (University of Chicago (US))
Please be sure to register for the Facilities meeting at Argonne:
- 13:10 → 13:15
-
13:20
→
13:25
Production 5mSpeaker: Mark Sosebee (University of Texas at Arlington (US))
-
13:25
→
13:35
OSG-LHC 10mSpeakers: Brian Lin (University of Wisconsin), Matyas Selmeci
-
13:35
→
13:40
Data Management 5mSpeaker: Armen Vartapetian (University of Texas at Arlington (US))
Dark data cleanup at BNL followed up in the DDMops jira: https://its.cern.ch/jira/browse/ATLDDMOPS-5465 . After the cleanup still significant leftover remains (300-400TB for DATADISK and about 100TB for SCRATCHDISK), which could be a reporting issue or not reported usage. Need to be checked on the storage side.
Independently of the previous point, BNL storage reporting is stuck since Nov.15 - showing absolute no change in storage numbers since then for any token. This may result in filling the storage. Mentioned this also in that ticket, with BNL guys in CC.
The storage reporting consistency issue at MWT2_UC_SCRATCHDISK, with storage numbers below the rucio ones. Looks like this happened after ~600K (~90TB) deletion on Nov.8-9 with subsequent transfers filling that freed space.
SLACXRD_LOCALGROUPDISK space reporting value dropped a couple of days ago, probably just reporting issue.
-
13:45
→
13:50
Networking 5mSpeaker: Dr Shawn McKee (University of Michigan ATLAS Group)
Working on issues with the OSG/WLCG MaDDash instance: https://psmad.opensciencegrid.org/maddash-webui/
- Have issues with IPv6 (dual-stack) nodes because of underlying library that MaDDash depends upon. perfSONAR developers are aware of the issue
- Currently there are cases where we have "grey" boxes that indicate no data BUT there actually is data. Most are due to IPv6 issue but in some cases there may be firewall issues
The PWA (pSConfig GUI) at https://psconfig.opensciencegrid.org has some issues getting all the hosts published in OIM and GOCDB. We are working on tracking down the problem in the code in GitHub: https://github.com/soichih/gocdb2sls
We have seen some cases where perfSONAR toolkit deployments have default limits set that prevent testing from working. The toolkits seem to be OK but test results are not showing up. In some cases this is because of a 10GByte directory size limit. The file to check for latency nodes is /etc/owamp-server/owamp-server.limits. The value to increase is 'disk=10G' Increase it to at least 50G (assuming your disk can hold this much).
- 13:50 → 13:55
-
13:55
→
14:30
Site Reports
-
13:55
BNL 5mSpeaker: Xin Zhao (Brookhaven National Laboratory (US))
- BNL FTS issues recently
- slow transfer from CERN to BNL, solved by raising priority
- wrongly formatted json file ??
- BNL FTS upgrade is planned for after thanksgiving
- preparation for moving prod PQs to UCORE
- John's script ready, which adjusts HTCondor accounting group quotas based on the pending jobs on the local queue
- JobRouter changes is done.
- Need to test them, but firstly we need to agree on the path forward on the analy vs prod issue
- another tape test, after increasing dCache tape disk buffer, is planned for early Dec.
- BNL FTS issues recently
-
14:00
AGLT2 5mSpeakers: Dr Shawn McKee (University of Michigan ATLAS Group), Prof. Wenjing Wu (Computer Center, IHEP, CAS)
No tickets/incidents
We finished upgrading the slave postgresql database to the dcache head node from sl6 to centos7. ZFS is used to host the postgresql database, and we upgradedthe postgresql to 10-10.6 from 10-10.5 for both the host and slave nodes.
We built the 1.8.2 openafs rpms on the centos 7.5 node. The new openafs client (1.8.2) is running well on the centos 7.5 node, we plan to test it on the SL6/7 nodes.
- 14:05
-
14:10
NET2 5mSpeaker: Prof. Saul Youssef (Boston University (US))
-
14:15
SWT2 5mSpeakers: Dr Horst Severini (University of Oklahoma (US)), Kaushik De (University of Texas at Arlington (US)), Mark Sosebee (University of Texas at Arlington (US)), Patrick Mcguigan (University of Texas at Arlington (US))
-
13:55
-
14:30
→
14:35
AOB 5m
-
13:00
→
13:05