Speakers:
Michael Ernst, Robert William Gardner Jr(University of Chicago (US))
Capacities
13:15
→
13:25
Production10m
Speakers:
Kaushik De(University of Texas at Arlington (US)), Mark Sosebee(University of Texas at Arlington (US))
summary
Mark reporting
Production has been busy - with many issues owing to Rucio migration, fallout from the accidental DDM deletion, etc. --> failing transfers, etc.
Activity is picking up, several sites are full. More jobs than from before.
Prodysys1 has been decommissioned. Note change in task IDs with Prodsys2. Must use bigpanda monitor.
See Jamboree last week for tutorial on the monitor.
New pilot release from Paul.
See links to ADC weeklies for relevant talks.
13:25
→
13:30
Data Management5m
Speaker:
Armen Vartapetian(University of Texas at Arlington (US))
Armen reporting
Main issue has been data loss due to the migration.
Initial number was 1M. Revised to 3.3M lost files.
Thus deletion activities have been limited. Not all has been understood.
Automatic deletions are halted. Only a low level of deletions are going.
Mostly impacting Tier1s. E.g. 400 TB to be deleted.
New model will come into effect early next year. See related Jamboree notes from last week.
13:30
→
13:35
Data transfers5m
Speaker:
Hironori Ito(Brookhaven National Laboratory (US))
Hiro reporting
No transfer issues, other than the space issues.
FTS at BNL is being primarily used (plus a couple more clouds)
13:35
→
13:40
Networks5m
Speaker:
DrShawn McKee(University of Michigan ATLAS Group)
Shawn reporting
Perfsonar sites need to upgrade to 3.4 or better after January 9.
Changed the traceroute method (default was B-->A), requires BWCTL to be running.
Two fixes: mesh configuration agent (do a forward rather than reverse).
Will likely move traceroute tests to go between BW nodes. Done centrally with mesh URL.
SLAC instances need updates. As does BNL (Hiro is working on it)
LHCONE meeting point-to-point circuits NSI (new standard for inter-domain implementations).
few sites now, welcome others to join
Goal - demonstrate circuit usage.
13:40
→
13:45
FAX5m
Speakers:
Ilija Vukotic(University of Chicago (US)), Wei Yang(SLAC National Accelerator Laboratory (US))
Ilija reporting
Reconciling differences in job efficiency w/ Kaushik
Also looking at data from hadoop
Will not be expanding overflow jobs until resolved.
Next pilot release will fix timeout issue for large files.
13:45
→
14:45
Site Reports
13:45
BNL5m
Speaker:
Michael Ernst(Unknown)
Michael
BNL networking working with ESnet on finalizing config for transatlantic connectivity
Probably joint P2P activity in January
Expecting delivery of WNs.
Storage is high on the list. DDN 2000 drive machine 1.8 PB usable, getting old, failures more frequent.
Thinking also about storage R&D re: storage
Talk of increasing ATLAS tape usage; 10,000 slot library. Volunteered US to work with ADC on the model.
13:50
AGLT25m
Speakers:
Robert Ball(University of Michigan (US)), DrShawn McKee(University of Michigan ATLAS Group)
Shawn
Open ticket on some step09 files, size mismatch. Checked - but differing with Rucio. Suspect a casualty of Rucio deployment. Checksums match! Saul has observed at other sites, and reported.
Getting some equipment (35 Dell R620's, were part of a large order that got cancelled). 256GB memory, dual 10g nics, redundant PS, E5-2670v2 (10C).
Storage MD3460 storage shelf at MSU. UM 600 TB MD3060s, 6TB, Lustre over ZFS.
13:55
MWT25m
Speaker:
Robert William Gardner Jr(University of Chicago (US))
Connect queues working well (analy and production) to Stampede, HU, ICC, Mietc.
well over 1,000 slots between the sites.
Procurement in progress.
CCC development
14:00
NET25m
Speaker:
Prof.Saul Youssef(Boston University (US))
Welcome Dave Caunt.
Working on procurement.
Nexus 7710 from MIT being setup in Manlan, will be how we peer with LHCONE
Very little production.
Problem with APF and HU CE nodes.
Worldwide FTS performance studies - US performance looks good, except to SARA and NIKKEF
Will be starting CondorCE on BU side
ATLAS Connect production
14:05
SWT2-OU5m
Speaker:
DrHorst Severini(University of Oklahoma (US))
OU is on LHCONE!
Filled cloud support list comments.
14:10
SWT2-UTA5m
Speaker:
Patrick Mcguigan(University of Texas at Arlington (US))
Revamping of internal network has resulted in much better performance.
4032 switch now throwing errors; contacted Dell. Upgraded firmware, and reboot. Monitoring.
Early stages of planning next purchase.
14:15
WT25m
Speaker:
Wei Yang(SLAC National Accelerator Laboratory (US))
14:45
→
14:50
AOB5m
Working on a storage purchase. Have to make a decision soon.
HTCondorCE is now working. Working on the job routing configuration.