RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
13:30
→
13:31
Experiment Operational Issues 1m
-
13:35
→
13:40
ATLAS Operations Report 5mSpeakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
-
13:40
→
13:45
CMS Operations Report 5mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
Antares tests in yellow warning mode was 'fixed' at the test end by tweaking the evaluation of the test, to make it more appropriate for sites with low numbers of links (such as RAL and CERN tapes). Has been green since.
Busy period writing to tape.
Another period of low efficiency CMS jobs doing mostly I/O only. Many T1s see the same. Job failure rate was fine.
Some issues with other sites reading data from Echo using AAA. Reading jobs were timing out because they couldn't get hold of data at RAL; Jyothish (again) increased the throttling level to allow more connections...but this time it seems too many for Echo to easily process. Remember that CMS jobs are likely making small, sparse reads. The IOPS went up higher than we are comfortable with. There is an associated ticket. https://helpdesk.ggus.eu/#ticket/zoom/2837
-
13:45
→
13:50
LHCb Operations Report 5mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
News:
- LHCbDIRAC downtime next week, drain starts this Friday
- Database upgrade, may take up to 1 week (though more likely a few days)
- Token-based ETF tests are being added to LHCb testing framework
- Some issues so far, though we can expect working tests from the preprod ETF machine soon
Operational issues:- A lot of failed transfers to/from ECHO last Friday, due to ceph roll-back
- Repeated yesterday, but the scale was significantly smaller
- Some failed jobs, due to problems with CERN-EOS-PILOT, not RAL's fault.
- LHCbDIRAC downtime next week, drain starts this Friday
-
13:50
→
13:55
ALICE Operations Report 5mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
-
13:55
→
14:00
LSST Operations Report 5mSpeakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
New IngestD versions deployed for autmated ingestion when data movement happens for the CCMS1 test.
Tim working on getting next dataset RC2 to RAL to make sure we have all currently used datasets at RAL (got access to Lancs UI to get various meta data and exports from the butler). Once this is in place, a tentative intention to alternate the testing of weekly pipeline builds (i.e., the actual processing jobs) between Lancs and RAL for testing.
DC2 Dataset now at RAL and after a brief hicup with a bad file that I quickly found and replaced.
Step 1 Of the pipeline was completed over the weekend.
Brian Yanny is now on leave, so passed responsibility to Jen Adelman-McCarthy who I am working with.Step 2 has failed to start with an out of memory error, but I think that was a miscommunication between CM team, and I believe can be resolved on their side
BUT, the creating of the mapping for the step has been noted to take more memory and time than at other sitesFor some reason the pipetaskInit took 1.5 hours and used 6 cores and lots of memory, even though usually this task takes just a couple of minutes.
-
14:00
→
14:01
Tier-1 Projects 1m
-
14:15
→
14:25
Anatares Upgrade 10m
New EOS nodes
Tape Robotics downtimeSpeakers: George Patargias, Thomas Byrne - 14:25 → 14:35
-
14:35
→
14:45
Utilizing GPUs 10mSpeakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
-
14:45
→
14:46
AOB 1m
-
14:46
→
14:55
Summary of Operational Status and Issues 9mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
14:55
→
15:00
Any other Business 5mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
13:30
→
13:31