- Grafana metrics: Might want to introduce additional rate metrics that subtract the header overhead to have the pure payload: low priority.
- Merged workflow fails if outputs defined after being used as input
- needs to be implemented by Giulio
- Cannot override options for individual processors in a workflow
- requires development by Giulio first
- Problem with 2 devices of the same name
- Usage of valgrind in external terminal: The testcase is currently causing a segfault, which is an unrelated problem and must be fixed first. Reproduced and investigated by Giulio.
- Run getting stuck when too many TFs are in flight.
- Do not use string comparisons to derrive processor type, since DeviceSpec.name is user-defined.
- Support in DPL GUI to send individual START and STOP commands.
- Add additional check on DPL level, to make sure firstOrbit received from all detectors is identical, when creating the TimeFrame first orbit.
- Implement a proper solution to detect wheter a device is firstInChain
- Deploy topology with DPL driver
PDP-SRC issues
- Check if we can remove dependencies on
/home/epn/odc/files
in DPL workflows to remove the dependency on the NFS
- reading / writing already disabled
- remaining checks for file existence?
- check after Pb-Pb by removing files and find remaining dependencies
logWatcher.sh
and logFetcher
scripts modified by EPN to remove dependencies on epnlog
user
- node access privileges fully determined by e-groups
- new
log_access
role to allow access in logWatcher
mode to retrieve log files, e.g. for on-call shifters
- to be validated on STG
- waiting for EPN for further feedback and modifications of the test setup
- computing estimate for 2024 Pb-Pb
- originally assumed 305 EPNs suffucient, but needed 340 EPNs (5 % margin) in the end
- 11 % difference
- estimate from 2023 Pb-Pb replay data with 2024 software
- average hadronic interaction rate of Pb-Pb replay timeframes with pile-up correction for ZDC rate
- formula: IR_had = -ln(1 - rate_ZDC / (11245*nbc) ) * 11245 * nbc * 7.67 / 214.5
- 2023, 544490, nbc=1088, rate_ZNC=1166153.4 Hz: IR_had = 43822.164 Hz
- 2024, 560161, nbc=1032, rate_ZNC=1278045.2 Hz: IR_had = 48417.767 Hz
- 10.5 % difference in IR from 2023 relpay to 2024 replay
- 7 % to 47 kHz assumed for the 2023 replay data (?) when estimating the required resources
-
- could at least explain part of the difference between the estimated and observed margins
- environment creation
- cached topologies
- in practice, only works when selecting only one detector or when defining the
Detector list (Global)
specifically in the EPN ECS panel
- when using
default
, the list of detectors is taken from default variables in ECS
- not yet clear where this is set, it obviously depends on the selected detectors
- the order of detectors is always different, even for identical environments, therefore, the topology hash is also different and the cached topologies are not used
- investigating together with ECS team
- start-up time
- ~80 sec spent in state transitions from
IDLE
to READY
- will profile state transitions with
export DPL_SIGNPOSTS=device
to determine if we wait for single slow tasks or if some other part (e.g. DDS) is slow