IT-protoDUNE coordination (Single Phase and Double Phase)

Name: IT-protoDUNE coordination (Single Phase and Double Phase)
Start: 2019-10-03T15:00:00+02:00
End: 2019-10-03T16:00:00+02:00
Location: CERN

Thursday 3 Oct 2019, 15:00 → 16:00 Europe/Zurich

513/R-070 - Openlab Space (CERN)

513/R-070 - Openlab Space

CERN

Show room on map

Xavier Espinal Curull (CERN)

Description

Coordination: JIRA, Minutes, etc. https://twiki.cern.ch/twiki/bin/viewauth/ProDUNEIT/WebHome

Hide

Room: Elisabetta, Xavi, Ignacio
Remote: Denis, Steve, Geoff

## Round table

3 meetings during the last couple of months
    - data management
    - compute model
    - collaboration meeting

Compute model: Top level discussios regarding datacenters, power targets,
workflows, data reduction, etc. We got the data models

Interested in EOS Token testing.
X: Different storage providers are trying to adapt to this (e.g dCache).
From EOS developers this should be possible to be tested in ~1 month
From the WLCG PoV we need to enforce that all the components understand
the same type of tokens.

X: Another important point is the LHC1 - Fermilab trusted network which is being
setup.

X: Some I missed the last 2 months in Dune governance?
- Singularity work going smooth.
- All Single-Phase items going correctly

EOS timeouts during the last week, or running very slow
X: Faulty card in router affected big chunk of the CCC

### NP02

Elisabetta: The JSON file issues detected yesterday was identified and fixed.
The root cause was coming from the modification of the objects to include the
number of events

Yesterday there was a power cut in Prevessin which affected all DAQ machines
-> Several hours to recover (no UPS available). Agreed with Filippo this situation
is not acceptable and needs to e addressed. (The error was restricted to Prevessin)

#### NP02 Slides

X: Regarding the ~1% failure rate in the transfers via link EHN1-> IT it would
be interested if the errors have some pattern /trend in the long term to locate
the root cause

Grafana Dashboards: Running locally, exposed to Lyon via SSH tunnel

(Ignacio: Add note to the MONIT config and dashboards)

Steve: When do you think you would be ready for compressing this data?
Not in the short time, but when we compress the size will be the same, we'll just
increase the number of events.
For processing, it should be ready in a couple of weeks

Is this the data in np02/reco ? Do you want to copy it to Fermilab?
S: In the next couple of weeks it will be added to Rucio, I'll contact you regarding that

X: Having the data on tape, it can be safely deleted from EOS (as we are recomending now
to use it as staging area)

S: What would the time line for CTA?
X: CTA is moving the big 4 experiments before the beginning of the RUN3. Probably the public
EOS instance will still be running on Castor until LS3.
For experiments relying on FTS and Rucio, the change is just a matter of endpoint name, it
doesn't really affect you.

S: How close are you to the operational voltage?
E: We are close to 3KV, and testing different stable setups to solve the problem of the
bubbles mentioned in the slides to go out of the High Pressure cycles because they can't
be maintained for long and require lots of preparation/downtime to recover normal pressure.

#### Geoff

- Plan to run for two weeks with the cosmic ray tagger on (detector addon). Maybe on November
- Will top LAr with Xe and study the effects
- Converting readout to FELIX (Atlas platform, modified to be use for DUNE)

#### Ignacio

New Linux Contact: Alejandro Iribarren

Tentative date: 7 November

There are minutes attached to this event. Show them.

- 15:00 → 15:05
  
  Coordination update 5m
  
  Speaker: Xavier Espinal Curull (CERN)
- 15:20 → 16:00
  
  Round-table 40m
  
  Speaker: All