Data transfers

513/R-068 (CERN)



Show room on map
Main topic to discuss: * Data transfers

Minutes taken by Julia


We did not go through the tracker, rather concentrate on the data transfer monitoring presentation from Luca.

Luca mentioned the problem with AAA traffic which is apparently much lower than real traffic as was detected following up the test submitted at NotreDame site. Dashboard does not miss any file records but number of read bytes are reported by GLED is much lower than what is clamed in the job log file. To be followed up with Matevz.

Discussion how to deal with sites which are shared between several VOs, how distinguish traffic between VOs. This is already a problem for DPM and dCache sites. Multi-VO support will be provided in XrdMon, but is not ready for data challenges. Dashboard can do separation on it's side under the condition that mixed sites send reports to a dedicated topic and Dashboard collector makes separation based on user_VO filed. The problem is that this field is not always properly defined, since CMS does not strictly require voms-stamped proxies. More info can be found here:

Topology resolution discussion


ALICE has a service which allows to resolve the closest site. Need to have a look whether it can be take on board by other experiments

Data processing discussion


Costin asked why not to calculate statistics in memory at the collector level

Essentially this is a direction we want to move, but aslo to have batch-like processing
Costin asked why at all batch-like processing is needed.

The answer is that we would like to come up with a generic monitoring platform and there are clearly cases when data reprocessing would be needed (SAM), so dual approach with batch-like and in-memory realtime processing is required

What is a retention policy?


For monitoring purposes we do not need to keep raw data for longer than month or so, but since xrootd monitoring is coupled with data popularity which requires much longer time for preserving raw data, the retention policy is defined by the requirements of the popularity application.

What is a plan for raw data, would it be kept in the oracle, not  in future only in hbase.

Elasticsearch , can we work with AI?

AI does not offer Elasticsearch as a service, it mostly offers expertise. Our usecases are more complicated (multi-field grouping).

Costin, would you be interested to reuse what ALICE is doing regarding in-memory calculation.

Luca is in favour of having dual way of processing (batch and real-time based on open source solution)


Topics for the next meeting (up to Pablo to decide)

Julia suggested update on migration of SAM tests to CondorG submission. Luca will be in vacations, possible if Marian is around.

Maarten suggested update on the status of the new  WLCG monitoring UI


There are minutes attached to this event. Show them.
    • 2:00 PM 2:05 PM
      JIRA actions for July 5m
      Review of the JIRA actions scheduled for this and next month
      Speaker: Pablo Saiz (CERN)
    • 2:05 PM 2:35 PM
      Data transfers 30m
      Speaker: Luca Magnoni (CERN)
    • 2:35 PM 2:55 PM
      Discussion 20m
    • 2:55 PM 3:00 PM
      Next meeting 5m