RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
13:30
→
13:31
Experiment Operational Issues 1m
-
13:35
→
13:40
ATLAS Operations Report 5mSpeakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
-
13:40
→
13:45
CMS Operations Report 5mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
Thursday-Friday we saw SAM test and transfer failures into Antares due to missing router ACLs(?) on the new EOS front-end.
Observed the problem again that number of running cores is very different in Vande compared to CMS monit. This discrepancy seems to be much reduced today. Doing some connection tests with FNAL to see if there is a problem (again) connecting to the schedulers based at FNAL.
Testing transfers at CNAF today and yesterday. Using RAL as a destination for reads from CNAF. Investigating some errors seen at RAL, whereas other CMS T1 destinations used show much lower (or zero) error rates. One error found just existed for 2 minutes - possible network glitch? Tom Birkett contacted DI.
Overall good performance of jobs relative to other CMS T1s.
-
13:45
→
13:50
LHCb Operations Report 5mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
Operational issues;
- There was a spike of failed WGProduction jobs last Sunday. Not our fault -- buggy xrootd client used by the jobs.
- There were some failed uploads from HLTFarm to Tier-1 sites (including RAL)
- These errors can be ignored -- HLTFarm does not have external connectivity, and this transfers should have never been submitted (but due to a bug in DIRAC they were..)
- Low level of upload failures from other sites to ECHO
- French sites seems to be the most affected
- Seems like transfers are just timing out due to low speed
- Almost all transfers from RAL to Lanzhou are failing
- So far it is not clear which side is problematic
-
13:50
→
13:55
ALICE Operations Report 5mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
-
13:55
→
14:00
LSST Operations Report 5mSpeakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
- RC2 pipeline now complete
- required amendments to the run to output logging every 10 mins so PanDA didnt kill the job
- Still need to investigate why RAL is taking longer than other sites as LANCS are now staging data for jobs the same way we are (via https / davs though a gateway)
- Now working with CM team to enable data retrival to the USDF for comparison and analysis of the sites outputs
- IngestD update deployed at RAL - Major version change, now running version 2.1
- RC2 pipeline now complete
-
14:00
→
14:01
Tier-1 Projects 1m
-
14:15
→
14:25
Anatares Upgrade 10m
New EOS nodes
Repack ProgressSpeakers: George Patargias, Thomas Byrne -
14:25
→
14:35
XRootD Development 10mSpeakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
-
14:35
→
14:45
Utilizing GPUs 10mSpeakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
-
14:45
→
14:50
SSD Storage Evaluation 5m
-
14:50
→
14:55
Echo deployment 5m
-
15:00
→
15:01
AOB 1m
-
15:01
→
15:10
Summary of Operational Status and Issues 9mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
15:10
→
15:15
Any other Business 5mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
13:30
→
13:31