MEETING OBJECTIVE - DEFINE STEPS REQUIRED IN ORDER TO BEGIN CONTACTING SITES
IF THERE IS TIME REMAINING:
Wednesday, 7 October 2015 from 16:00 to 18:00 (Europe/Zurich)
Enrico Vianello Dave Dykstra Hung-Te Lee Xavier Mol Georgios Bitzes Sam Skipsey Mario Lassnig Oliver Keeble Fabrizio Furano Cédric Serfon Christophe Haen Marian Babik Andrea Ceccanti
Georgios – Doc on how to install and run locally. Probe Updates increased verbosity failed-PUT logic Oliver – GGUS support unit Cédric – Topology feed → Marian Marian – Move to pre-production and opening of the firewall Christophe – LHCb statement for sites Cédric – Atlas statement for sites Sam - Contact GridPP sites and request validation volunteers
> Automatic integration of topology with SAM/Nagios
Already agreed with Stefan for LHCb - VO feed will be used Atlas – Cédric to pass a code snippet to Marian which allows extraction of the relevant list from AGIS.
> Opening the firewall
This was not discussed but I add further info here. As this is a sensitive service (scheduling jobs with powerful credentials) the firewall will only be opened when we go to pre-production. This requires a security audit, puppetisation of the service, some scale testing and valid topology feeds from the VOs.
> Functional completeness of probe & ancillary tests for operational debugging
The principle to be adopted is to make the probe as verbose as possible right now and then to trim if necessary. Put full request and response headers. A test to show SSL info would be good for debugging purposes.
> Multi-range requests ("vector reads")
Unnecessary to test at this stage.
> Classification criteria (OK, CRITICAL, WARNING, UNKNOWN), Timeout values, warning thresholds (currently 10s)
All OK here.
> SAM & visualisation
SAM will provide "standard" visualisation in the future, but the check_mk interface will remain and is frequently the preferred path for debugging. Check_mk is now preview but will be puppetised and promoted to pre-production. Then we will get a SAM3 visualisation via the "preprod SAM3"
> Allow probe resubmission by sites?
Yes, this is and will remain possible.
> Site-oriented documentation
Agreed that the probe results should incorporate a link to a TF FAQ to help sites understand what is expected and assist in fixing problems. This doc should contain: explanation of TF statement from experiments about priority and criticality explanation of tests links to docs from the storage providers contact point for questions faq/knowledgebase instructions for running probe locally Statements from the Atlas and LHCb will be supplied.
> Ticketing system - GGUS support unit?
Agreed to create a GGUS Support Unit to handle communication with sites. Action Oliver.
> Validate system using some volunteer sites
KIT volunteered Sam to put the proposal to GridPP storage folks. Enrico will test the script against storm
> Code Review
A code review of the probe is required because the proxy used by SAM is a powerful one. To be completed before any production use.
> What to do if PUT fails
Currently the probe's first action is to PUT a file. If this fails, the testing is abandoned and all other test results are UNKNOWN. This behaviour should change so sites whose data is readable over HTTP are visible. Thus if PUT fails the probe should first check for the existence of a standard file in the root of the VO space called "<VO_IN_CAPS>_HTTPTFtest.txt" and if that file is absent attempt to put a test file via SRM. Note - subsequent discussion indicated that doing a "stat" and "ls" on the top level directory may also be a good solution.
> Review of outstanding issues in the "How to support HTTP for WLCG" document
Andrea Ceccanti noted that as StoRM does not yet support 3rd party copy with HTTP it can be discussed in a future TF meeting if the experiments wish.
> Transfer monitoring - feedback from meeting 15th July
Andrea Ceccanti reported substantial agreement from the StoRM perspective on the proposed monitoring data path: udp→collector→dashboard.