FTS3 Steering Meeting

Europe/Zurich
31/S-028 (CERN)

31/S-028

CERN

30
Show room on map

FTS Steering Meeting - 14/03/2018

 

Participants:

Remotely: Brian, Catalin, Chris, Ilija

Locally: Andrea, Maria, Eddie, Mario, Joachim, Cedric , Martin, Giuseppe, Alessandro DiGi

Minutes: Eddie

 

FTS team reorg:  

  • Brian asked if Aris will be physically at CERN as a fellow, Andrea confirmed this.

FTS News and Plans:

  • Mario: How did you get the FTS logo?. Maria: I contacted first Melissa from the IT communication and she put me in contact with someone from the CERN graphics design team. I did exactly the same for the EOS logo.

  • Brian: on the memory usage, what is the situation now. Andrea: currently 4MBs per fts-url-copy. Need to work on gfal and all the deps in order to reduce the memory. The target would be to use 2 MBs per transfer after the optimisations if possible.

  • Brian: For the sign urls, where do the credentials live? On the fts server? They are stored in the fts server, there is a way to configure per vo or endpoint. Mario: Diff credentials per bucket not per hostname used in google. Andrea is not possible at the moment, you will need to use diff certificates or  same certificate with diff role. Brian: would it be possible to send signed url to fts? Andrea yes you can do it but you you will expose it in the monitoring and in the logs. Brian it would be useful to have an interface to pass secrets to FTS.

  • Ilija’s proposal: We will do more study on what he reported. Basic message: The main worry in ATLAS is that we do not saturate the links, they need this to be checked and fixed. We will check this behaviour on more links and more fts instances and decide on what to do. We have plans to work on the scheduler this year. We work on the automatic session reuse right now, when it will be used it will improve the situation with small files. Ilija: it will be nice but it is second order. Andrea: Is not a quick fix, the scheduler does not know about the throughput, we will see what needs to be done. Ilija: Should we try to manually change things and configs in order to optimise specific links for example in Chile. Andrea: You can already start playing. Brian: So for the scheduler, do we have enough stats and metrics to monitor it? For example the Queue size as a function of time? Andrea: you can already check Queue time and number of actives per link. Maria: The scheduler data can be provided through messaging in Kibana and then you can do the discovering with the data. Andrea: and you can use kibana to make new plots.

  • Brian: For the topology, can we provide topology info? Answer: CRIC/AGIS will be used to cache this information.It will not be static info inside FTS. This is like hints to the scheduler.

  • Ilija: on network info integration: You cannot rely on information from perfsonar or whatever we have. Let’s start by using what we already have to improve the scheduler behaviour.

  • Joachim: I think you have the throughput already in the optimiser. Yes it is per link.

  • Mario: Can we do anything on the experiment side to help you with scheduler improvements? Andrea: To test what we will implement and really check that it is better.

  • Mario: We already retrieve some data from FTS to make Rucio aware of queues. Maybe you could do the same for diff FTS instances, query every 5 mins for example from CERN to RAL and see the queue levels. We could expose an API to report how much is the load, how close we are to the limit. To be checked.

  • Andrea: What is limiting the scalability is that each VM is putting load on the DB. Each new VM puts more pressure on the DB.

  • Mario: Issue with delegation, Andrea: this is done.

  • Mario: Load Balancing, issue coming from NERSC. They do not provide the LB from the site and we have to do it on our side. Right now we could do a random shuffle on the client side. We would like to move this to FTS. Andrea: We have to check and get back to you.

  • Mario: We now setting up rucio as a site token issuer. Andrea: Already there in fts-devel, there is an endpoint you can use to play. Will be released on 3.8.0

 

GridFTP vs XRootD evaluation

  • Ale: Submitting periodically or you submitted at once 1M files? Maria: Starts with 1M of files at once but it continuously send until a timeout.

  • Ale: What about checksums? Maria: There was no checksum in this test but I was testing the checksum for EOS to CASTOR with xroot/gridftp and also other endpoints as DPM during the checksum generalization implementation in the previous FTS release. The checksum EOS to CASTOR is working properly, not the case for DPM with xroot.

  • Maria and Giuseppe: Gateway becomes a bottleneck in gridftp. So it’s obvious that you will gain a lot.

  • Ale: Well done, this is what we wanted. Thanks.

There are minutes attached to this event. Show them.