RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
13:30
Experiment Operational Issues
-
1
ATLAS Operations ReportSpeakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
-
2
CMS Operations ReportSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
What to do with pledges?
For CMS:
CPU : 62800 -> 72767 (9967)
Echo: 7686 -> 9394 (1708)
Antares: 24528 -> 29438 (4910)
Another period of low efficiency CMS jobs coinciding with long read-times (for small amounts of data). Many T1s see the same. Job failure rate was again fine. CMS CompOps investigated and there is some issue with remote reads. Remote reads were turned off (not sure how the data is then accessed..?). There is also a lot of discussion over the number of cores being used by the jobs.
As discussed previously on the subject of reading data from Echo using AAA. Reading jobs were timing out because they couldn't get hold of data at RAL. Increasing the throttling level to allow more connections - IOPS went up higher than we are comfortable with. There is an associated ticket. https://helpdesk.ggus.eu/#ticket/zoom/2837
- 3
-
4
ALICE Operations ReportSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
-
5
LSST Operations ReportSpeakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
Permissions are now being organised by Role and not by Group, allowing for more fine-grain control (Thanks Jyothish for the suggestion)
AuthDb now correct and accepting data and jobs again from LSST
DC2 run by the CM team at SLAC been failing due to OOM errors. this has finally been found due to a config error on their end that was causing entire data sets to be loaded and assessed rather than the two small patches of sky.
new config meant it takes 6 mins to do something that was failing after several hours before.
Nalin joined the team from Monday as a graduate, to work on monitoring and testing for the DRP jobs and infrastructure at RAL.
Now successfully running DC2 weekly jobs at RAL for the first time, including creating and running a 'campaign' from scratch.
RAL LSST jobs:
-
14:00
Tier-1 Projects
-
6
Anatares Upgrade
New EOS nodes
Tape Robotics downtimeSpeakers: George Patargias, Thomas Byrne -
7
XRootD DevelopmentSpeakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
-
8
Utilizing GPUsSpeakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
-
14:45
AOB
-
9
Summary of Operational Status and IssuesSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
10
Any other BusinessSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
13:30