RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
13:30
→
13:31
Experiment Operational Issues 1m
-
13:35
→
13:40
ATLAS Operations Report 5mSpeakers: Brij Kishor Jashal (Rutherford Appelton Laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
-
13:40
→
13:45
CMS Operations Report 5mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
Monday - issues with Echo due to busy-ness of the system rebalancing new hardware. Red day for SAM tests because of this.
Some periods failing SAM tests for AAA machines in the last 2 weeks. Each time Katy did nothing and they 'fixed themselves'.
Antares downtime today for CTA version upgrade. New EOS front-end nodes now fully installed and in prod. These are apparently dual-stack, but I'm not seeing any green yet on the 'connection' SAM test that checks for this. Are any of the tests using the new nodes - to be checked.
-
13:45
→
13:50
LHCb Operations Report 5mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
Operational issues:
- Upload and download failures to ECHO (GGUS:)
- Caused by ECHO issues, looks much better now
- There was a small spike of failed uploads this morning, which may be related to a different issue
- Caused by ECHO issues, looks much better now
- LHCb drained yesterday.
- Lack of jobs, not our fault.
- Faulty LHCbDIRAC release deployed this morning.
- May cause drain again.
- Upload and download failures to ECHO (GGUS:)
- 13:50 → 13:55
-
13:55
→
14:00
LSST Operations Report 5mSpeakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
- Investigating job slowness compared to other sites in LSST (RAL 3:10:12, IN2P3 1:17:53, LANCS 1:39:05)
- IO from echo?
- Current thinking from LSST is that reading files from echo takes twice as long as other sites
- 0.6s per file at RAL, over 0.3s at others. Not much but over 20,000 files adds up! (And this is a small test data set not a real one)
- LANCS mentioned today in the storage meeting they are creating a pool just for LSST with 4+2 to try and improve performance for LSST
- 0.6s per file at RAL, over 0.3s at others. Not much but over 20,000 files adds up! (And this is a small test data set not a real one)
- Current thinking from LSST is that reading files from echo takes twice as long as other sites
- CPU / memory bound?
- IO from echo?
- Request to all the sites if there is a shared space at their site to create a sqLite database within a DAG (job of jobs) to track progress within the DAG
- Informed requester of infrastructure at RAL and they seem happy that the local scrath for jobs and ECHO would be fine
- Will begin movement of ComCam data to RAL this week
- Investigating job slowness compared to other sites in LSST (RAL 3:10:12, IN2P3 1:17:53, LANCS 1:39:05)
-
14:00
→
14:01
Tier-1 Projects 1m
-
14:15
→
14:25
Anatares Upgrade 10m
New EOS nodes
Repack ProgressSpeakers: George Patargias, Thomas Byrne -
14:25
→
14:35
XRootD Development 10mSpeakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
error 500 issue narrowed down to Glasgow ipv6 routing issues. scitag forced the transfers to ipv6, hence why they were failing. WIP Glasgow side to resolve this. lcgfts3 has been configured to disable ipv6 for that site and could be used as an emergency measure if needed.
xrootd 5.8.4 is out, no significant improvements for RAL needing urgent update
-
14:35
→
14:45
Utilizing GPUs 10mSpeakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
-
14:45
→
14:50
SSD Storage Evaluation 5m
-
14:50
→
14:55
Echo deployment 5m
-
15:00
→
15:01
AOB 1m
-
15:01
→
15:10
Summary of Operational Status and Issues 9mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
15:10
→
15:15
Any other Business 5mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
13:30
→
13:31