xrootd monitoring discussion

Europe/Zurich

Attended:

Alessandra Forti, Ilija Vukotic, Diego Davila, Derek Weitzel, Marian Zvada, Borja Garrido Bear, Maarten Litmaath, Albert Rossi, Robert Currie, Frank Wuerthwein, Andrea Sciaba, Costin Grigoras, David Smith, Julia Andreeva

 

Short summary of the discussion after presentation of Derek and Diego (see attached slides).

We can not rely on UDP for WAN.

Julia asked Costin whether ALICE rely on client side monitoring. In difference of what we have thought, ALICE rely on service side monitoring. The interesting approach used by ALICE is that they consider only  file close reports. Special configuration of the xrootd servers sent by Costin after the meeting allows to resolve source-destination of a given operation based  on the file close reports. The question was , how the duration of operation is calculated then. Costin explained that ALICE is using a simplified approach which proved to work well and give reliable results. The aggregation period is defined and stays constant. Let's say X seconds. Only file close operations which contain  amount of transferred data for a complete operation is considered. Then sizes of all files which have been closed over X seconds bin for  any A to B channel (active during this time bin) are summed up and divided by X seconds to get an aggregated rate for A to B. This bin just by itself would be wrong, but the overall picture would normally look right. Costin provided further details in a private followup mail:

======================================================================

The trick is to divide the amount of data by a fixed interval, which is the accumulation period. Let's say 1 minute. The interval is not important for as long as it's a fixed, known value.

Regardless of for how long the file was kept open, the value is always divided by 1 minute.

Let's say two messages were received in the last reporting interval:

- Message A (file X, opened one hour ago, moved 1 GB)
- Message B (file Y, opened 30 seconds ago, moved 1 GB)

What the collector would do is report sum(all) = 2GB / 60s = 33.3MB/s , for the last minute of activity.

Just this bin in itself is obviously wrong. But:

- the integral of data is correct

- with enough accessed files, the activity averages out

- only the very high granularity plots would have a noticeable difference from the network interface monitoring

- averaging rates at arbitrary time intervals is mathematically correct, the integral of accessed data is always correct

- as soon as you look at 5-10 minute bins you can't tell the difference from the network interface counters

And more importantly we don't need to keep any history of client ids, file names, timestamps when they were opened or any other unique identifiers.

Another advantage is that one doesn't have to go back in time and distribute the bytes to each bin that covers the access period. So really no memory is needed for the collector apart from the amount of data in and out.

======================================================================

May be this approach can be used in the new implementation of the collector which might make it much lighter? Rough estimation more than 60%  less data to process. We still need to understand whether we can get information about VO from the file close report. In case of ALICE it is not an issue, since they use dedicated xrootd servers. Which is not always the case if we consider an overall WLCG infrastructure, so we need to have this info as a part of the file close report. Currently it might be impossible.

Will ask xrootd developers whether file close reports can be extended with the following info: VO, start time stamp of the operation, application level meta data. In principle this should be enough for our current goals for monitoring of the xrootd traffic.

However, as Frank pointed out we should also consider other use cases, that is monitoring of caches. Ilija started this work using g-stream monitoring flow. By the time we discussed it at the meeting, Ilija had left, we might need to ping him to report about this work at the next meeting.

Apart of the monitoring flow sent by native xrootd servers, we should also cover use case of dCache with xrootd door. It looks like this should not be a problem, since dCache has all necessary information in the DB, so it should not be difficult to implement reporting of the agreed set of information. But we need to followup with the dCache developers.

We discussed roughly overall architecture and deployment model. Looks like there is a consensus that we can rely on udp only for LAN, therefore we might need to foresee site level component which would translate UDPs into message queue reports as suggested 'shoveler' in the architecture described by Derek and Diego. This component should not be experiment specific, but rather an extension of xrootd itself, deployed at every site.

We agreed to try and meet again in a number of weeks. Meanwhile, to ping xrootd developers to ask about the possibility to extend file close reports with some additional info as described above.

People are asked to check the googledoc

https://docs.google.com/document/d/11TcJwtHtU2yGJbDYJbhJJslA2QmAV_ly-52xDXXD9A8

and provide their ideas/comments/concerns there.

 

There are minutes attached to this event. Show them.