RAL Tier1 Experiments Liaison Meeting
Access Grid
RAL R89
-
-
14:00
Major Incidents Changes
-
1
Summary of Operational Status and IssuesSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore, Kieran Howlett (STFC RAL)
-
14:10
Experiment Operational Issues
-
2
VO Liaison CMSSpeaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))
A smooth week for CMS, good performance and green tests.
A small number of repeatedly failing production tape transfers (file not on Echo) - Katy to investigate.
There is a new AAA proxy machine - Katy to add to monitoring in Vande (also Shoveler once we have the production instance).
Jyothish did upgrades for the old AAA machines ceph-gw10/11.
Asked George to remove the dependance of cms-rucio-services machine on VOMS infrastructure. CMS wish to remove this (ATLAS did so this week), at the very very latest the end of the month. This machine also needs upgrades and reboot.
Any schedule yet for tokens on the batch farm?
Tentative plans for UK transfer tests (DC24 levels?) after data-taking in November or December. Perhaps additional tests in spring before data-taking in 2025 for TAPE, if that's when Antares gets connected to the OPN.
DC24 report is here: https://zenodo.org/records/11401878
-
3
VO-Liaison ATLASSpeakers: Dr Brij Kishor Jashal (RAL, TIFR and IFIC), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
-
4
VO Liaison Others
-
5
VO Liaison LHCbSpeaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
-
6
VO Liaison LSSTSpeaker: Timothy John Noble (Science and Technology Facilities Council STFC (GB))
in /etc/lsst/prolog.sh,
export LSST_RUN_TEMP_SPACE="/tmp/lsst/sandbox"
. It will not work for Rubin jobs. For rubin jobs,LSST_RUN_TEMP_SPACE
is a shared space for jobs at different worker nodes. Rubin jobs are different from ATLAS jobs that every single job is independent. Rubin jobs are not independent. The first Rubin job in a workflow will create a directory and write some information toLSST_RUN_TEMP_SPACE
, then other jobs read and write information inLSST_RUN_TEMP_SPACE.
at the end, the final job will read all information the workflow directory inLSST_RUN_TEMP_SPACE
.Slac has a shared POSIX file system mounted on all nodes.
Rubin can support different storages. S3 is supported. - Therefore could use the S3.echoFabio HernandezFabio Hernandez 1:50 PM
In our site, our Slurm compute nodes have some local storage capacity. We use that capacity for jobs to use as working storage. Once the job finishes, the data in that area is deleted.There is another area that is needed by PanDA and I think that is what you refer to. That area store some information needed by a campaign to update the Butler registry database with the data produced and stored by the jobs in the Butler data store. The data stored in that area is relatively small but indeed needs to be available by the so called “Final Job” to do its data registration work. In our case, that area resides in CephFS and is mounted by all the Slurm compute nodes.It seems to me that is described in some document. Let me try to find out where.Fabio HernandezI think this is the document: https://panda.lsst.io/admin/site_environments.htmlThe relevant piece is the area namedLSST_RUN_TEMP_SPACE
-
7
VO Liaison APELSpeaker: Thomas Dack
-
14:45
AOB
-
8
Any other BusinessSpeakers: Brian Davies (Lancaster University (GB)), Darren Moore
-
14:00