RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
13:30
Experiment Operational Issues
-
1
ATLAS Operations ReportSpeakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
-
2
CMS Operations ReportSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
Red days for SAM on Tuesday/Wednesday due to the continuation of network issues/interventions reported last week.
Observed large discrepancy between CMS (monit) and RAL (Vande) monitoring on running cores. This turned out to be scheduling inefficiency on CMS side with slow ramp up of scheduling agents at FNAL.
Tape downtime caused CMS to go into drain several times. The Rucio status (for Echo) was overridden by Data Management. Katy overrode the status to keep jobs running. CMS needs to treat this better and not send sites into drain just because tape is down.
A few spikes in job failures and low efficiency which may be related to network blips (long read times).
RAL FTS removed entirely from CMS-Rucio operations.
-
3
LHCb Operations ReportSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
-
4
ALICE Operations ReportSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
-
5
LSST Operations ReportSpeakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
- DC2 pipeline complete, but working with CM team to standardise the data return process
- The script to return data is hard-coded in places, which all rely on URI, which is currently incorrect
- Working on moving data to the correct location now
- Approx 4 million files
- writing a script to use multiprocessing with Python to move data
- The script to return data is hard-coded in places, which all rely on URI, which is currently incorrect
- Will run DC2 again with the weekly code from week 18 from Friday (1st May)
- And will continue to run every other week going forwards to assist with site and code base testing and troubleshooting
- After DC2 data return process complete will work on getting the RC2 data registered in a butler repository for smaller test but with pre-curser data
- LSST want to use a UK FTS for European transfers, so will want to either use SKA FTS or LCGFTS - or at least use it for fail over
- Not using a FTS for a time and wanting to use it for failover could lead to further failures
- Do ATLAS and CMS use a single central FTS for all transfers or do they also use geographically closest ones?
- From talking to other devs Rucio currently uses only the FTS related to the destination
- DC2 pipeline complete, but working with CM team to standardise the data return process
-
14:00
Tier-1 Projects
-
6
Anatares Upgrade
New EOS nodes
Tape Robotics downtimeSpeakers: George Patargias, Thomas Byrne -
7
XRootD DevelopmentSpeakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
-
8
Varnish For ATLASSpeaker: Brij Kishor Jashal (Rutherford appelton laboratory)
-
14:45
AOB
-
9
Summary of Operational Status and IssuesSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
10
Any other BusinessSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
13:30