Name: US ATLAS Computing Integration and Operations
Start: 2014-12-10T13:00:00-05:00
End: 2014-12-10T15:00:00-05:00
Location: Other Institutes

- 13:00 → 13:15
  
  Top of the Meeting 15m
  
  Speakers: Michael Ernst, Robert William Gardner Jr (University of Chicago (US))
  
  Capacities
- 13:15 → 13:25
  Production 10m
  
  Minutes
  
  Speakers: Kaushik De (University of Texas at Arlington (US)), Mark Sosebee (University of Texas at Arlington (US))
  
  summary
  Mark reporting
  
  Production has been busy - with many issues owing to Rucio migration, fallout from the accidental DDM deletion, etc. --> failing transfers, etc.
  
  Activity is picking up, several sites are full. More jobs than from before.
  
  Prodysys1 has been decommissioned. Note change in task IDs with Prodsys2. Must use bigpanda monitor.
  
  See Jamboree last week for tutorial on the monitor.
  
  New pilot release from Paul.
  
  See links to ADC weeklies for relevant talks.
- 13:25 → 13:30
  Data Management 5m
  
  Minutes
  
  Speaker: Armen Vartapetian (University of Texas at Arlington (US))
  Armen reporting
  
  Main issue has been data loss due to the migration.
  
  Initial number was 1M. Revised to 3.3M lost files.
  
  Thus deletion activities have been limited. Not all has been understood.
  
  Automatic deletions are halted. Only a low level of deletions are going.
  
  Mostly impacting Tier1s. E.g. 400 TB to be deleted.
  
  New model will come into effect early next year. See related Jamboree notes from last week.
- 13:30 → 13:35
  Data transfers 5m
  
  Minutes
  
  Speaker: Hironori Ito (Brookhaven National Laboratory (US))
  Hiro reporting
  
  No transfer issues, other than the space issues.
  
  FTS at BNL is being primarily used (plus a couple more clouds)
- 13:35 → 13:40
  Networks 5m
  
  Minutes
  
  Speaker: Dr Shawn McKee (University of Michigan ATLAS Group)
  Shawn reporting
  
  Perfsonar sites need to upgrade to 3.4 or better after January 9.
  
  Changed the traceroute method (default was B-->A), requires BWCTL to be running.
  
  Two fixes: mesh configuration agent (do a forward rather than reverse).
  
  Will likely move traceroute tests to go between BW nodes. Done centrally with mesh URL.
  
  SLAC instances need updates. As does BNL (Hiro is working on it)
  
  LHCONE meeting point-to-point circuits NSI (new standard for inter-domain implementations).
  
  few sites now, welcome others to join
  
  Goal - demonstrate circuit usage.
- 13:40 → 13:45
  FAX 5m
  
  Minutes
  
  Speakers: Ilija Vukotic (University of Chicago (US)), Wei Yang (SLAC National Accelerator Laboratory (US))
  Ilija reporting
  
  Reconciling differences in job efficiency w/ Kaushik
  
  Also looking at data from hadoop
  
  Will not be expanding overflow jobs until resolved.
  
  Next pilot release will fix timeout issue for large files.
- 13:45 → 14:45
  Site Reports
  - 13:45
    BNL 5m
    
    Minutes
    
    Speaker: Michael Ernst (Unknown)
    
    Michael
    
    BNL networking working with ESnet on finalizing config for transatlantic connectivity
    
    Probably joint P2P activity in January
    
    Expecting delivery of WNs.
    
    Storage is high on the list. DDN 2000 drive machine 1.8 PB usable, getting old, failures more frequent.
    
    Thinking also about storage R&D re: storage
    
    Talk of increasing ATLAS tape usage; 10,000 slot library. Volunteered US to work with ADC on the model.
  - 13:50
    AGLT2 5m
    
    Minutes
    
    Speakers: Robert Ball (University of Michigan (US)), Dr Shawn McKee (University of Michigan ATLAS Group)
    
    Shawn
    
    Open ticket on some step09 files, size mismatch. Checked - but differing with Rucio. Suspect a casualty of Rucio deployment. Checksums match! Saul has observed at other sites, and reported.
    
    Getting some equipment (35 Dell R620's, were part of a large order that got cancelled). 256GB memory, dual 10g nics, redundant PS, E5-2670v2 (10C).
    
    Storage MD3460 storage shelf at MSU. UM 600 TB MD3060s, 6TB, Lustre over ZFS.
  - 13:55
    MWT2 5m
    
    Minutes
    
    Speaker: Robert William Gardner Jr (University of Chicago (US))
    
    Connect queues working well (analy and production) to Stampede, HU, ICC, Mietc.
    
    well over 1,000 slots between the sites.
    
    Procurement in progress.
    
    CCC development
  - 14:00
    NET2 5m
    
    Minutes
    
    Speaker: Prof. Saul Youssef (Boston University (US))
    
    Welcome Dave Caunt.
    
    Working on procurement.
    
    Nexus 7710 from MIT being setup in Manlan, will be how we peer with LHCONE
    
    Very little production.
    
    Problem with APF and HU CE nodes.
    
    Worldwide FTS performance studies - US performance looks good, except to SARA and NIKKEF
    
    Will be starting CondorCE on BU side
    
    ATLAS Connect production
  - 14:05
    
    SWT2-OU 5m
    
    Minutes
    
    Speaker: Dr Horst Severini (University of Oklahoma (US))
    
    OU is on LHCONE!
    
    Filled cloud support list comments.
  - 14:10
    SWT2-UTA 5m
    
    Minutes
    
    Speaker: Patrick Mcguigan (University of Texas at Arlington (US))
    
    Revamping of internal network has resulted in much better performance.
    
    4032 switch now throwing errors; contacted Dell. Upgraded firmware, and reboot. Monitoring.
    
    Early stages of planning next purchase.
  - 14:15
    
    WT2 5m
    
    Speaker: Wei Yang (SLAC National Accelerator Laboratory (US))
- 14:45 → 14:50
  
  AOB 5m
  
  Minutes
  
  Working on a storage purchase. Have to make a decision soon.
  
  HTCondorCE is now working. Working on the job routing configuration.

Choose timezone

US ATLAS Computing Integration and Operations

Other Institutes

Share this page

Direct link

Social networks

Calendaring