Eddie had a discussion with Christophe Haen at CHEP. He has agreed to add a feature to FTS to allow the intermediate hop between two endpoints to be specified in the FTS configuration. This will allow LHCb to request a transfer between two endpoints as normal and FTS will take care of the multi-hop transparently. See ticket FTS-1483.
Michal is taking over gfal development and will implement the new query prepare protocol. Querying using stat/extended attributes will be removed. Overall this will simplify the gfal code. No changes are required in FTS.
1. Michael: Add Michal to castor-tape-dev
2. Michael: Update PREPARE document with JSON response format for query prepare
3. Michal: update gfal with query prepare protocol
CMS_fam tapepool has been split into two. The two smaller tapepools have been test migrated successfully, so this problem is solved.
Michael and Giuseppe discussed some issues around migrating files with two tape copies. Giuseppe has proposed a solution, but it makes several assumptions about how the data is organised in CASTOR. Michael is working with Giuseppe and Vlado to clean up a few corner cases where the assumptions did not hold.
The only outstanding issue to be agreed with CMS is how to partition files between the experiment and user instances.
Cédric gave an update. Several issues came up to be discussed with the rest of the team next week:
1. Repack is working. A test was planned with ~40 tapes, but in the end the test was only performed with 2 tapes. Julien was unhappy with the multiple archive mount issue. Does this need to be fixed or is it an optimisation that we can defer until later?
2. In CASTOR, repack operations were managed by some external scripts written by Daniele Kruse. Do these scripts solve the problems with repack and can they be adapted to work with CTA?
Comment from Vlado after the meeting: "The scripts do not deal with failed repacks in any way. They leave it to the operator to resubmit the repack or do something else. It is in my MERIT to adapt those scripts to CTA. If anyone prefers to do it, I am not against that.
3. Do we need to implement drive dedications?
Comment from Vlado: YES, YES, YES. Did I say YES?
Steve is working on a new space-aware Garbage Collector. See also Steve's comments below (under AOB) on space-aware back pressure logic.
There are several issues we don't yet understand well:
1. The low-level details of the JAlien workflow
2. The setup of the 5PB disk space, in particular whether it will be used simultaneously for reading and writing. There was talk of a 10 GB/s data rate but we are not sure if this is simplex or duplex, or even if it is correct.
1. Michael: Follow up on Oliver's suggestion of contacting Maarten Litmaath to fill in the gaps in our understanding.
2. Meeting with Costin to be scheduled after we have met with Maarten and know what we are talking about.
Steve has fixed the EOS rate limiting parameters. However, rate limiting is not doing what we thought it did. When a user exceeds the specified limit for an action (archive or prepare), then all of that user's requests are limited. So EOS rate limiting cannot be used to solve the problem of load balancing between archives and retrieves.
When we dug into the causes of the problem, it became apparent that the MGM-WFE interface code was written as a quick hack. It would benefit from being refactored. This will be necessary in any case if we decide to implement bulk requests from EOS to CTA.
1. Julien: Deploy new EOS version with Steve's fixes and repeat the archive/retrieve tests. Then we can decide if what we have is good enough for ATLAS or if we need to come up with another solution.
Steve would like us to get rid of the CERN-specific Oracle RPMs. There is one outstanding Oracle bug which prevents this: occi.so.19.1 should be added as a provides: to the "basic" instant client RPM.
Steve asked Nilo if this could be fixed. Nilo has opened the following ticket with Oracle:
SR 3-21491336321 : libocci.so not advertised in 19c Instant client linux RPM files.
Back pressure logic
With respect to the back pressure logic being parameterized by space:
The cta-taped daemon needs to know the name of the EOS space to be used when querying EOS for free space. There are at least two possible solutions. The first solution is to extract the name from the new destination URLs used to write retrieved files to EOS disk. The new structure of a destination URL is as follows:
1747 destStream << "root://" << gOFS->HostName << "/" << fullPath << "?eos.lfn=fxid:"
1748 << fxidString;
1749 destStream << "&eos.ruid=0&eos.rgid=0&eos.injection=1&eos.workflow=" <<
1750 RETRIEVE_WRITTEN_WORKFLOW_NAME <<
1751 "&eos.space=" << gOFS->mPrepareDestSpace;
The cta-taped daemon would extract "eos.space=XXXX". The second solution is to add a new SPACE column to the DISK_SYSTEM table within the CTA catalogue database. This table effectively gives the configuration parameters to be used by the back pressure logic of the cta-taped daemon.