CMS feedback
- stability of the infrastructure, especially during downtimes of CERN services since monitoring information is very valuable during these times
- how to separate it and put on the high-availability mode
- since we rely on ES/InfluxDB we need tutorials about their QL
- with growths of the infrastructure, dashboards we need an easy tool to find appropriate information, similar to google search
- it may require data annotation, indexing, etc.
- we need to be periodically informed about R&Ds and directions MONIT is planning such that we can influence in a discussion on these subjects, e.g. if there is an internal Jira (ticketing system) which we can look and see
- ability to specify the severity level of tasks/tickets
CMS adaptation to MONIT
- overall we start moving more aggressively to MONIT infrastructure
- usage of ES/Kibana/Grafana is growing among different CMS groups
- usage of HDFS is mostly up to experts
- HDFS workflows is hard to use/write/execute for an average user, therefore an additional layer may be more desired, e.g. Job Monitoring ES+Spark is a good example
- we start seeing growth in usage of Monit CLI
CMS Plans
[1] https://indico.cern.ch/event/908539/contributions/3822566/attachments/2030916/3398965/2020_05_Rumble.pdf