Color code: (news during the meeting: green, news from this week: blue, news from last week: purple, no news: black)
Event Display Commissioning
- Purchase of ED PC in progress, have an old NVIDIA PC as fallback for the time being.
Problems during operation
Issues on EPN farm affecting PDP:
- AMD GPUs currently not working on CS8, investigating, for the moment the EPN must stay at CC8.
Issues currently lacking manpower, waiting for a volunteer:
- Tool to "own" the SHM segment and keep it allocated and registered for the GPU. Tested that this works by running a dummy workflow in the same SHM segment in parallel. Need to implement a proper tool, add the GPU registering, add the FMQ feature to reset the segment without recreating it. Proposed to Matthias as a project. Remarks: Might be necessary to clear linux cache before allocation. What do we do with DD-owned unmanaged SHM region? Related issue: what happens if we start multiple async chains in parallel --> Must also guarantee good NUMA pinning.
- Becomes an urgent topic now, most complicated part will be integration with EPN control. To be discussed when Andreas is back from Vacation.
- For debugging, it would be convenient to have a proper tool that (using FairMQ debug mode) can list all messages currently in the SHM segments, similarly to what I had hacked together for https://alice.its.cern.ch/jira/browse/O2-2108
Workflow repository
- Waiting for AliECS to implement new fields in the GUI.
- Need new ODC version before we can make the O2 version selectable (currently fixed to the latest)
EPN DPL Metric monitoring:
- Johannes has added the required parts to the telegraph configuration.
- Tested and seems working.
- CPU and Memory metrics missing, after discussing with Giulio we have to add another command line option to add them.
- Currently sending too many metrics, metric size must be reduced before this can be used in production. Discussed in Jira here: https://alice.its.cern.ch/jira/browse/O2-2583