RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
13:30
Experiment Operational Issues
-
1
ATLAS Operations ReportSpeakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
-
2
CMS Operations ReportSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
Thursday-Friday we saw SAM test and transfer failures into Antares due to missing router ACLs(?) on the new EOS front-end.
Observed the problem again that number of running cores is very different in Vande compared to CMS monit. This discrepancy seems to be much reduced today. Doing some connection tests with FNAL to see if there is a problem (again) connecting to the schedulers based at FNAL.
Testing transfers at CNAF today and yesterday. Using RAL as a destination for reads from CNAF. Investigating some errors seen at RAL, whereas other CMS T1 destinations used show much lower (or zero) error rates. One error found just existed for 2 minutes - possible network glitch? Tom Birkett contacted DI.
Overall good performance of jobs relative to other CMS T1s.
-
3
LHCb Operations ReportSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
Operational issues;
- There was a spike of failed WGProduction jobs last Sunday. Not our fault -- buggy xrootd client used by the jobs.
- There were some failed uploads from HLTFarm to Tier-1 sites (including RAL)
- These errors can be ignored -- HLTFarm does not have external connectivity, and this transfers should have never been submitted (but due to a bug in DIRAC they were..)
- Low level of upload failures from other sites to ECHO
- French sites seems to be the most affected
- Seems like transfers are just timing out due to low speed
- Almost all transfers from RAL to Lanzhou are failing
- So far it is not clear which side is problematic
-
4
ALICE Operations ReportSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
-
5
LSST Operations ReportSpeakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
- RC2 pipeline now complete
- required amendments to the run to output logging every 10 mins so PanDA didnt kill the job
- Still need to investigate why RAL is taking longer than other sites as LANCS are now staging data for jobs the same way we are (via https / davs though a gateway)
- Now working with CM team to enable data retrival to the USDF for comparison and analysis of the sites outputs
- IngestD update deployed at RAL - Major version change, now running version 2.1
- RC2 pipeline now complete
-
14:00
Tier-1 Projects
-
6
Anatares Upgrade
New EOS nodes
Repack ProgressSpeakers: George Patargias, Thomas Byrne -
7
XRootD DevelopmentSpeakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
-
8
Utilizing GPUsSpeakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
-
9
SSD Storage Evaluation
-
10
Echo deployment
-
15:00
AOB
-
11
Summary of Operational Status and IssuesSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
12
Any other BusinessSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
13:30