# WLCG Site Monitoring Migration
MONIT team - 07.05.2020
---
## Overview
* Proposal to move WLCG Site Monitoring (SAM3) to MONIT
* ==No changes== from WLCG Management / Site Admins perspective
* Same ETF test and Alice tests
* Same profiles (aggregagation logic)
* Same PDF report output
* Just a different infrastructure handling the data
---
## Old SAM3 Dashboards
* Based on old dashboards infrastructure
* Test results from ETF + Alice
* Enriched using VOFeed data
* Dashboards
* One dedicated instance per vo
* Latest view: last test results view
* Historical view: site and service availability, tests history
* Reports
* Generated monthly and send to WLCG
* All_Sites, Tier1_History, Tier1_Summary, Tier1_VO
---
## New MONIT Site Monitoring
* Based on MONIT infrastructure
* Test results from ETF + Alice
* Enriched using VOFeed data
* Dashboards
* In MONIT Grafana WLCG organization
* Latest view
* Historical view
* Reports
* Generated monthly and send to WLCG
* All_Sites, Tier1_History, Tier1_Summary, Tier1_VO
---
## New Homepage
:::info
http://cern.ch/monit-wlcg-sitemon
:::
![](https://codimd.web.cern.ch/uploads/upload_ba90805c1a0a4fdaea3e17bb90fbc6e7.png)
---
## Dashboards
* Available from MONIT Grafana WLCG organization
* Latest and Historical views as before
* Working on adding info about recomputation requests
* Proposal for data retention (Still being agreed):
* 1 year for site/service availability from dashboards
* but PDFs report kept forever
* 1 year for raw test data from dashboards
* but HDFS archival for several years
---
## Dashboards
![](https://codimd.web.cern.ch/uploads/upload_a507555b149554f88d8269ab8008c655.png)
---
## Monthly Reports (I)
* Output format
* Today we have: PDF, JSON, CSV, HTML
* Is the HTML output still needed? (First feedback was it's not)
* Federation availability
* In the old infrastructure federations with multiple sites ignore the ones without data
* Leading to 100% federation availability even if one site was not available at all
---
## Monthly Reports (II)
* Unkown status
* In the old infrastructure is computed on top of OK
* Leading to availabilities of >100%
* No ETF data
* In the old infrastructure is replaced by OK
* Sites whitout testing for a while showing close to 100% availability
* All this can be solved by issuing recomputation requests
---
## Profiles Definition
* Managed internally by the MONIT team
* Only the VO critical profiles available so far
> ALICE_CRITICAL
> ATLAS_CRITICAL
> CMS_CRITICAL
> LHCB_CRITICAL
* Good opportunity to clean legacy profiles
* Please open a SNOW ticket to request missing profiles
---
## Recomputation Requests
* Managed by Experiment representatives
* Based on gitlab, one simple json doc per request
* Built-in tracking of requests history
* Detailed documentation provided in the repository
```json
{
"dst_site": "T2-BR_SPRACE",
"periods": [
{
"start_time": "2020-01-01 00:00:00",
"end_time": "2020-01-06 20:00:00",
"status": "OK",
}
]
"vo": "cms"
}
```
---
## Current Status
* :white_check_mark: Test results and downtimes integrated in MONIT
* :white_check_mark: Availability and reliability computed per service and site
* :white_check_mark: Equivalent Grafana dashboards available in WLCG org
* :white_check_mark: Exact same PDF reports generated
* :white_check_mark: Data validated against the old infrastructure
* :large_orange_diamond: Add recomputation requests info in dashboards
* :large_orange_diamond: Stop old dashboards and infrastructure
---
## Migration Plan
* May:
* 01: New dashboards available for testing/feedback
* June:
* 01: May draft reports from **Old** and **New** infrastructure
* 16: May final reports from **Old** infrastructure
* July
* 01: June draft reports from **Old** and **New** infrastructure
* 16: June final reports from **New** infrastructure
* 31: Stop old dashboards but keep infrastructure running
* August:
* 31: Retire the old infrastructure (dashboards and reports)
---
## Next Steps
* From MONIT Team
* Add recomputation requests info in dashboards
* User support on provided feedback
* From WLCG/Experiments
* Validate new reports for May and June
* Provide feedback on dashboards, reports, and tools
* Ask for profiles that might be missing
---
## Thank You
http://cern.ch/monit-support
---