RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
14:00
→
14:01
Major Incidents Changes 1m
-
14:05
→
14:06
Summary of Operational Status and Issues 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore, Kieran Howlett (STFC RAL)
-
14:10
→
14:11
Experiment Operational Issues 1m
-
14:15
→
14:16
VO Liaison CMS 1mSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
As mentioned last week, cap on CMS running cores was removed. In the last few days CMS has taken over the farm! However, in the last day or more the farm utilisation is less than 100%...so I guess this is exactly how it should work. Other VOs should send us more work! It looks like ATLAS has pending jobs but their running cores is still low.
With respect to the number of jobs running...performance is very bad (failures and efficiency). Some issues (failures) may be coming from the CMS side. However, the LHCOPN 100G link is currently saturated and I think this is the problem with the efficiency.
New SAM tests for webdav protocol on Antares, following the introduction of the Tape REST API were failing. For some reason the webdav test suite is different to the root test suite, which of course has been green for some time. The new list of tests included a PROPFIND test on the /store/ directory - this was the failing one. George added the DN to give permissions to read and the test is green since Saturday.
We are also following up the load tests which have been failing for tape for many months - probably since they started.
A couple of issues with Echo storage relating to new version of XRootD - Jyothish rolled back and did some other fixes. See also Alex's issues this week.
Katy created docs for installation and usage of Shoveler. It's been running in test for 2 years! Am hoping that Jyothish(?) will find time to bring this into production soon. Katy is planning to validate the numbers for CMS and in collaboration with other VOs write a CHEP paper on this.
(CHEP abstract deadline is this Friday! ...but will be extended by one week.)
-
14:20
→
14:21
VO-Liaison ATLAS 1mSpeakers: Dr Brij Kishor Jashal (RAL, TIFR and IFIC), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
-
14:25
→
14:26
VO Liaison Others 1m
-
14:30
→
14:31
VO Liaison LHCb 1mSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
LHCb:
- Checksum setup was incorrect on the preprod farm last week.
- As a result, all downloads were failing
- Issues with CNAF-RAL transfers
- CNAF storage uses
authentication
key in their http headers, our xrootd version only acceptsAuthentication
(case sensitive) - Xrootd upgrade attemted yesterday, caused even more problems (though the issue with the keys was solved)
- Rolled-back to the old version
- CNAF storage uses
- After the prefetch change roll-out some WGprod jobs are still causing xrootd proxy to run out of memory
- The scale of the problem is not clear yet
- Copying to RAL from RRCKI is finished
- Some final cleanup may be necessary
- Lists of files affected by xrootd bug are ready
Alice:
- CS-147 ticket needs reaction
- Checksum setup was incorrect on the preprod farm last week.
-
14:35
→
14:38
VO Liaison LSST 3mSpeaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
Data still ingesting from Lancaster - ingested 15267 datasets out of 19852
Just ran into broken pipe issue, so will need to modify ingest code to skip datasets alreay ingested, as the butler ingest-raw comamnd does not filter out the already done
Should be again re-ingesting this afternoon, with completion this friday. If issues do not persist.
Once RAWs are ingested can configure it into a collection in the Butler and DC2 tests can be completed at RAL
Starting data movement using LSST Rucio raw data to RAL
-
14:40
→
14:41
VO Liaison APEL 1mSpeaker: Thomas Dack
-
14:45
→
14:46
AOB 1m
-
14:50
→
14:51
Any other Business 1mSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
14:00
→
14:01