Registration open for the OSG CIC workshop at Duke University
See campusgrids.org
Michael
Moving towards IaaS
Pre-GDB meeting yesterday discussing this.
Lots of material from the experiments
From ATLAS - lots of progress has been made; in full production on 16k cores. Condor scaling issue encountered, resolved.
Next step is to look into the concept more in general. At CERN, IT has invited the experiments to evaluate possibility of using IaaS without CEs or batch systems.
Making reasonable progress towards moving to SL6, but we're not there yet. By end of month for sure.
13:10
→
13:15
Integration points5m
Speaker:
Robert William Gardner Jr(University of Chicago (US))
Review of storage deployment, SL6 migration, gLexec5m
Shuwei validating on SL6.4; believes ready to go. BNL_PROD will be modified - in next few days 20 nodes will be converted. Then the full set of nodes.
Doug - provide a link from the SIT page. Notes prun does compilation.
Main thing to consider is whether you upgrade all at once, or rolling.
BNL will be migrated by the COB today! Will be back online tonight. BNL did the rolling update.
Look at AGIS - changing panda queues much easier
Are the new queue names handled reporting? If they are members of same Resource Group.
What about $APP? Needs a separate grid3-locations file. But the new system doesn't use it any longer.
Schedule:
BNL
June 10 - AGLT2 - will do rolling
MWT2 - still a problem with validations; could start next week
SLAC - week of June 10
NET2 - all at once. Week of June 17
UTA - all at once. June 24. Lots of dependencies - new hardware, network. A multi-day outage is probably okay.
OU - all at once. Rocks versus Puppet decision. After July 5.
Goal: Majority of sites supporting the new client by end of June. May need to negotiate continued support
this meeting
BNL
MWT2
AGLT2: 1/3 of worker nodes were converted; ran into a CVMFS cache size config issue, but otherwise things are going well. The OSG app is owned by usatlas2, but validation jobs are now production jobs. Doing rolling upgrade. They are using the newest cvmfs release. n.b. change in cache location. Expect to be finished next week.
Fully migrated.
NET2: HU first, then BU. At HU - did big bang upgrade; ready for Alessandro to do validation. Ran into problem with host cert. 2.1.11 is production. One machine at BU. Hope to have this done in two weeks. BU team working on HPC center at Holyoke.
HU done
BU: GPFS testing is complete. Top priority. Augustine will be working on this non-stop.
SWT2 (UTA)
End of week will be migrated. OU will be coming on Monday to go over the Rocks6.
SWT2 (OU)
Will be visiting on Monday. OSCER queues doing validation now.
WT2: Failed jobs on test nodes - troubleshooting with Alessandro. Expect to be complete by end of next week.
Some validation jobs are failing, and also with Analysis queue
Writing to CVMFS?
No response from Alessandro.
Updates from the Tier 3 taskforce?
last meeting
Report is due by July
Doing testing of Tier 3 scenarios using grid or cloud resources
Working with AGLT2 as a test queue.
Managed to get surveys from every Tier 3 site. Writing assignments will be setup for the final report.
Half the community does not have resources on their campus.
Solve the data handling problem to local resources; as fully supported DDM endpoint. gridftp-only endpoints were never fully supported.
Survey report will be available in two weeks
this meeting
Glexec
See https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMinutes130620. The plan foresees making the gLExec SAM tests critical starting from October; experience shows that setting up gLExec and passing the tests is not difficult, it is well documented and it does not risk affecting any production workflow.
We will need a volunteer to test this out and give feeback to the group.
AGLT2 can start looking at it in a couple of weeks
NET2 would like to help
Torre: notes discussion in ADC management, preparing statement to the effect that this would be optional for sites.
Reviewing LHCONE connectivity for the US ATLAS Facility (Shawn)
last meeting(s)
June 1 is the milestone date to get all sites on.
BNL , AGLT2 , 2 sites from MWT2 2/3 (*MWT2_IU needs action, see below.)
SLAC
notes:
Updates?
OU - status unknown.
UTA - conversations with LEARN, UTA, I2 are happening. There has been a meeting. They are aware of the June 1 milestone.
NET2 - new 10g link is setup. 2 x 10 g to HU. Chuck is aware of the June 1 LHCONE milestone. Saul will follow-up shortly, expects no problem by June 1.
IU - plan is to decide friday whether whether we need to bypass the brocade, access Juniper directly to peer with LHCONE. Fred is working closely with the engineers.
Shawn - Mike O'Conner has been putting together a document with best practices. Will have examples on how to route specific subnets that are announced on LHCONE.
Three configurations: 1. PBR (policy based routing). 2. Providing a dedicated routing instance. Virtual router for LHCONE subnets. 3) Physical routers for gateway for LHCONE subnets.
NET2: have not been pushing it, but will get ball rolling again - will contact Mike O'Conner and provide feedback.
OU: there was a problem at MANLAN which has been fixed. Direct replacement from BNL to OU. Will start on LHCONE next.
this meeting
Updates?
IU is actually on LHCONE, but not on the 100g link. Hopefully later this week.
OU and UTA: blocker is getting appropriate examples on how to do this.
NET2: no update.
Shawn will get names of people who should be involved. Will open an email.
Storage
UTA storage expected to be online by Friay
Compute Server Subcommittee5m
Speaker:
Robert Ball(University of Michigan (US))
13:15
→
13:25
Production and Operations10m
Speakers:
Kaushik De(University of Texas at Arlington (US)), Mark Sosebee(University of Texas at Arlington (US))
Speaker:
Hironori Ito(Brookhaven National Laboratory (US))
Claims backlog is caused by DDM SS not submitting jobs quickly enough. A separate DDM SS just for DATADISK - and situation is improving dramatically. Not related to FTS, network, or sites at all. Production transfer is not higher priority, so they are not chosen. Note also SS are operated at CERN.
13:35
→
13:40
Networking and Throughput5m
Speaker:
DrShawn McKee(University of Michigan ATLAS Group)
13:40
→
13:45
FAX5m
Speakers:
Ilija Vukotic(University of Chicago (US)), Wei Yang(SLAC National Accelerator Laboratory (US))
13:45
→
13:50
Site reports: BNL5m
Speaker:
Michael Ernst(Unknown)
13:50
→
13:55
Site reports: AGLT25m
Speakers:
Robert Ball(University of Michigan (US)), DrShawn McKee(University of Michigan ATLAS Group)
13:55
→
14:00
Site reports: MWT25m
Speakers:
MrDavid Lesny(Univ. Illinois at Urbana-Champaign (US)), Sarah.elizabeth Williams(Indiana University (US))
14:00
→
14:05
Site reports: NET25m
Speaker:
Prof.Saul Youssef(Boston University (US))
14:05
→
14:10
Site reports: SWT2-OU5m
Speaker:
DrHorst Severini(University of Oklahoma (US))
14:10
→
14:15
Site reports: SWT2-UTA5m
Speaker:
Patrick Mcguigan(University of Texas at Arlington (US))
14:15
→
14:20
Site reports: WT25m
Speaker:
Wei Yang(SLAC National Accelerator Laboratory (US))