US ATLAS Computing Integration and Operations

US/Eastern
Other Institutes

Other Institutes

Description
Notes and other material available in the US ATLAS Integration Program Twiki
    • 13:00 13:20
      Top of the meeting 20m
      Speakers: Michael Ernst (Unknown), Robert William Gardner Jr (University of Chicago (US))
      ADCTIM2014 Minutes

      US ATLAS Computing Integration and Operations

      November 12, 2014

      Attending: many
      Apologies: Shawn, Jason

       

      ADC TIM 2014 Chicago 

      • Minutes available
      • Remind Simone about action items.

      Condor-CE:

      • New interest in global ATLAS
      • How it appears in the AGIS;  SAM tests.
      • Need to define the gaps, and other issues identified by Bob,e.g.
      • Should discuss at the Jamboree

      Tier3 hardware

      • End of life hardware
      • Idea to augment Tier3's with retiring hardware from Tier2, to be used for end-user analysis
      • Creation of a shared Tier3 pool - making these available for US physicists
      • 4 year retirement versus 5 years
      • Discussion of Tier2 sites opening resources for ATLAS Connect; all Tier2's have indicated a willingness to set this up.
      • Accounting discussions should take place in the RAC
      • Will need to look at priorities, and enforcement.

       

      Welcome new personnel from NETier2.  John Brunelle left to become a Google Engineer in LA.  James Cuff has assigned two new people to the group - they'll be joining next week.  Congratulations!  Thanks to Saul for making it a smooth transition.

       

      Mayuko - LOCALGROUPDISK monitoring

      • Slides
      • The system manages space based on a facility wide quota per user
      • Might be possible to add system to allow users to easily delete.
      • Who receives the messages about cleanup decisions?
      • Who is responsible for removing datasets?  Kaushik: hope that Rucio provides this capability, so users can do this themselves.  (At present, Armen has to do it.)
      • Not yet ready for users. Kaushik would like RAC approval.

       

      Dave Lesny - Stampede and StratumR

      • Slides
      • Michael - good idea for other sites.  Mike Norman from SDSC offer for ATLAS Connect. 
      • What edge services are required?
      • Other HPC sites?  Revisit post SC14 

       

      Production - Mark

      • Summaries are posted
      • Things are quiet.
      • Production levels are low; expect fluctuations.
      • Now is a good time for downtimes
      • 8k jobs at BNL_CLOUD - this is due to a special request.  Running MCORE jobs at scale in AWS.  Studying scaling issues at various ends.  (Most of this not mandatory production)

       

      DDM issues

      • Armen: generally things are looking okay
      • Rucio migration planned during Thanksgiving week
      • Next week expect irregularities
      • Saul: all sites should get on the current version of dq2-home, rucio-home; the versions are slightly different.  Make sure to use the "latest".
      • Dave - the pilot wrapper once called out the ddm setup, Jose would like to change the setup.  Horst, Saul - have setup in a different place(s).  Leave things as they are now, make the change in one step; just do the urgent change: make sure you do: source /cvmfs/atlas.cern.ch/repo/sw/ddm/latest/setup.sh.  Dave: this will always give you the latest.
      • Jose is looking for guidance.  Saul - will take this up, BU as a test site.  Dave will help as well.  Horst: should remember to get rid of atlas-wn. 
      • Michael: we need some consolidation across the sites; make it coherent; then let Jose know how we want to proceed.  Consulting with Dave & Horst.

      Site reports

      • BNL
        • AWS scaling test, using BNL SE (setting up AWS SE).  FTS3 has added support for S3, will evaluate.  ESnet setting up direct connect between ESnet and Amazon.  20,000 cores - identified bottlenecks in the Amazon western region, leading to job losses.
      • AGLT2
        • Not much going on.
      • MWT2
        • Discussing purchases.
      • NET2
        • Purchasing coming soon.  Added an OSG queue.
      • SWT2
        • Patrick: quiet, nothing to report.  Horst: nothing to report, all three sites running well
      • WT2
        • Looking to replace Thumpers and Thors.  No discounts coming from Dell. R730 with 24 2.5 inch drives.  Higher than list!

      AOB

      • Bob: User analysis jobs that run more CPU time than wall time.  Seems to be related to older version of RootCore jobs, and they are killed.  Alden wil circulate to DAST.
    • 13:20 13:35
      LOCALGROUPDISK Monitoring 15m
      Speaker: Mayuko Kataoka (University of Texas at Arlington (US))
      Slides
    • 13:35 13:55
      Stampede and StratumR 20m
      Speakers: Mr David Lesny (Univ. Illinois at Urbana-Champaign (US)), Robert William Gardner Jr (University of Chicago (US))
      Slides
    • 13:55 14:00
      Production 5m
      Speaker: Kaushik De (University of Texas at Arlington (US))
      notes
    • 14:00 14:20
      Site reports 20m