RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
13:30
→
13:31
Experiment Operational Issues 1m
-
13:35
→
13:40
ATLAS Operations Report 5mSpeakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
-
13:40
→
13:45
CMS Operations Report 5mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
No issues to report. Spike in job failures in the middle of last week was attributed to bad campaign of CMS jobs.
Tape deletions have started. About 3PB has been deleted so far, with deletions starting again this morning (670TB to be done). Another phase of the deletion is in preparation, with nearly another 1PB planned for deletion at RAL likely in the second half of March.
-
13:45
→
13:50
LHCb Operations Report 5mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
- Xrootd nrpoc issue is still present (GSTSM-327)
- Possible mitigations are being discussed.
- Some sprucing jobs are failing due to unstaged input data
- Data is supposed to be staged at CERN, so not our fault
- Xrootd nrpoc issue is still present (GSTSM-327)
-
13:50
→
13:55
ALICE Operations Report 5mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
-
13:55
→
14:00
LSST Operations Report 5mSpeaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
List of datasets aquired for LSST data to be moved to RAL, verifying which user and which RSE (LSST are separating data in Rucio by RSE so there are 3 RSEs per site just for disk)I should use to create these rules to ensure consistency
Once verified will start data movement, followed by local registration / URI replacement
Following ingestion of the data to RAL butler, CM team can start sending various pipeline tests that use real data to RAL
-
14:00
→
14:01
Tier-1 Projects 1m
-
14:05
→
14:15
Preparing for mini UK Data Challenge 10mSpeakers: Mr James Adams, Katy Ellis (Science and Technology Facilities Council STFC (GB))
Mini-DC next week focused on Echo. CMS, ATLAS and LHCb taking part:
Mon: Half fill the 2x100Gbps OPN link with data from CERN to Echo. Total traffic approx 100Gbps. Try to balance the rates between the VOs e.g. using FTS config. Observe deletions.
Tues: Keep data flowing as Monday. Cut the OPN (James Adams can do this) and observe the performance of the fallback to LHCONE.
Wed: Re-instate the OPN. Increase the rate to try to fill the pipe (200Gbps). Additional disk space (i.e. the pledge for the next year) provided by RAL so as to not be reliant on deletion rate.
Thur: Read test for Echo, sending data to CERN and other Tier 1s (some OPN, some not?)
Fri: Tests with Tier 2s - AM writes in Echo, PM - reads from Echo.
-
14:15
→
14:25
Anatares Upgrade 10m
New EOS nodes
Tape Robotics downtimeSpeakers: George Patargias, Thomas ByrneTitle: Notification of a 4-day outage of the RAL tape endpoint from 28th April to 1st May 2025.
The tape robotics and a large amount of the hardware behind the Antares tape service at RAL is now 5 years old. We are about to commence a major programme of work to service, replace, and upgrade relevant parts of the hardware. Originally this was planned to take place during the winter shutdown however due to delays in the procurement process outside our control most of the hardware will now be delivered in March.
While much of the work can be carried out transparently or with only minor degradation to the service, it will be necessary to have a complete downtime to work on the robotics for 4 days, from Monday 28th April at XXX until Thursday 1st May at YYY [1]. We appreciate that this is very close to the planned start of stable beams and therefore wanted to give experiments as much notification as possible, so they have time to plan any necessary mitigating actions. Unfortunately we do not have any flexibility in the dates.
The overall programme of work includes:
Service / preventive maintenance on the Tape Robot.
Upgrade of the Spectra TFinity Robot to LumOS.
Installation of TS1170 drives and 83PB of additional media.
Installation of additional Tape Servers and Fibre Channel equipment.
Replacement of the EOS buffer front end. These nodes will have IPv6 enabled and also have direct access to the LHCOPN link as well as be on the LHCONE.
If you have any questions about this, please contact your relevant RAL VO contact; Alex Rogovski (LHCb, ALICE), Brij Jashal (ATLAS), Katy Ellis (CMS) or submit a GGUS ticket to RAL-LCG2.
Thank you for your understanding.
Alastair
RAL Tier-1 Manager
[1] INSERT GOCDB LINK to downtime
-
14:25
→
14:35
XRootD Development 10mSpeakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
- Writable WN gateways
- New xrd-ceph version is deployed to the LHCb-only test WN
- Applies buffering only for writes, so memory consumption should be reduced
- LHCb config that allowed the jobs from the preprod farm to write to ECHO via xrootd had been removed ~2 weeks ago
- Hotfixed missed during the new DIRAC release
- Re-applied this morning
- New xrd-ceph version is deployed to the LHCb-only test WN
- Writable WN gateways
-
14:35
→
14:45
Utilizing GPUs 10mSpeakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
-
14:45
→
14:46
AOB 1m
-
14:46
→
14:55
Summary of Operational Status and Issues 9mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
14:55
→
15:00
Any other Business 5mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
13:30
→
13:31