software update:

update the OSG software and htcondor-ce to the most recent release on all 3 gate keepers

Frontier Squid is also updated to 4.10-1.1.osg34.el6 

Plan to upgrade all our SLC6 nodes to SLC7, including dcache,htcondor,afs services

 

Job Errors:

A lot of jobs failing at this error:

Non-zero return code from RAWtoESD (65); Logfile error in log.RAWtoESD: "AthMpEvtLoopMgr ERROR Failure in waiting or sub-process finished abnormally"

Some of the work nodes fail 100% of the jobs, we identified and rebuilt around 15 affected work nodes, and after rebuilding, they do not seem to fail many jobs (failure rate lower than 10%)

Note: This error also appears to the jobs on other 8 sites, AGLT2 fails 1/5 of them, there is no ticket, not sure if the error is from the job itself or the work nodes.