RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
13:30
→
13:31
Experiment Operational Issues 1m
-
13:35
→
13:40
ATLAS Operations Report 5mSpeakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
-
13:40
→
13:45
CMS Operations Report 5mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
What to do with pledges?
For CMS:
CPU : 62800 -> 72767 (9967)
Echo: 7686 -> 9394 (1708)
Antares: 24528 -> 29438 (4910)
Another period of low efficiency CMS jobs coinciding with long read-times (for small amounts of data). Many T1s see the same. Job failure rate was again fine. CMS CompOps investigated and there is some issue with remote reads. Remote reads were turned off (not sure how the data is then accessed..?). There is also a lot of discussion over the number of cores being used by the jobs.
As discussed previously on the subject of reading data from Echo using AAA. Reading jobs were timing out because they couldn't get hold of data at RAL. Increasing the throttling level to allow more connections - IOPS went up higher than we are comfortable with. There is an associated ticket. https://helpdesk.ggus.eu/#ticket/zoom/2837
- 13:45 → 13:50
-
13:50
→
13:55
ALICE Operations Report 5mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
-
13:55
→
14:00
LSST Operations Report 5mSpeakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
Permissions are now being organised by Role and not by Group, allowing for more fine-grain control (Thanks Jyothish for the suggestion)
AuthDb now correct and accepting data and jobs again from LSST
DC2 run by the CM team at SLAC been failing due to OOM errors. this has finally been found due to a config error on their end that was causing entire data sets to be loaded and assessed rather than the two small patches of sky.
new config meant it takes 6 mins to do something that was failing after several hours before.
Nalin joined the team from Monday as a graduate, to work on monitoring and testing for the DRP jobs and infrastructure at RAL.
Now successfully running DC2 weekly jobs at RAL for the first time, including creating and running a 'campaign' from scratch.
RAL LSST jobs:
-
14:00
→
14:01
Tier-1 Projects 1m
-
14:15
→
14:25
Anatares Upgrade 10m
New EOS nodes
Tape Robotics downtimeSpeakers: George Patargias, Thomas Byrne -
14:25
→
14:35
XRootD Development 10mSpeakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
-
14:35
→
14:45
Utilizing GPUs 10mSpeakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
-
14:45
→
14:46
AOB 1m
-
14:46
→
14:55
Summary of Operational Status and Issues 9mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
14:55
→
15:00
Any other Business 5mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
13:30
→
13:31