HTTP Deployment Task Force Minutes

Wednesday, 7 October 2015 from 16:00 to 18:00 (Europe/Zurich)

 

Present

Enrico Vianello
Dave Dykstra
Hung-Te Lee
Xavier Mol
Georgios Bitzes
Sam Skipsey
Mario Lassnig
Oliver Keeble
Fabrizio Furano
Cédric Serfon
Christophe Haen
Marian Babik
Andrea Ceccanti

Actions

Georgios – Doc on how to install and run locally.
           Probe Updates
			increased verbosity
			failed-PUT logic

Oliver – GGUS support unit

Cédric – Topology feed → Marian

Marian – Move to pre-production and opening of the firewall

Christophe – LHCb statement for sites

Cédric – Atlas statement for sites

Sam - Contact GridPP sites and request validation volunteers

Summary

> Automatic integration of topology with SAM/Nagios

Already agreed with Stefan for LHCb - VO feed will be used

Atlas – Cédric to pass a code snippet to Marian which allows extraction of the relevant list from AGIS.

 

> Opening the firewall

This was not discussed but I add further info here. As this is a sensitive service (scheduling jobs with powerful credentials) the firewall will only be opened when we go to pre-production. This requires a security audit, puppetisation of the service, some scale testing and valid topology feeds from the VOs.

 

> Functional completeness of probe & ancillary tests for operational debugging

The principle to be adopted is to make the probe as verbose as possible 
right now and then to trim if necessary. Put full request and response headers.

A test to show SSL info would be good for debugging purposes.

 

> Multi-range requests ("vector reads")

Unnecessary to test at this stage.

 

> Classification criteria (OK, CRITICAL, WARNING, UNKNOWN), Timeout values, warning thresholds (currently 10s)

All OK here.

 

> SAM & visualisation

SAM will provide "standard" visualisation in the future, but the 
check_mk interface will remain and is frequently the preferred path for debugging. Check_mk is now preview but will be puppetised and promoted to pre-production. Then we will get a SAM3 visualisation via the "preprod SAM3"

 

> Allow probe resubmission by sites?

Yes, this is and will remain possible.

 

> Site-oriented documentation

Agreed that the probe results should incorporate a link to a TF FAQ to help sites understand what is expected and assist in fixing problems.

This doc should contain:

        explanation of TF
        statement from experiments about priority and criticality
        explanation of tests
        links to docs from the storage providers
        contact point for questions
        faq/knowledgebase
        instructions for running probe locally

Statements from the Atlas and LHCb will be supplied.

 

> Ticketing system - GGUS support unit?

Agreed to create a GGUS Support Unit to handle communication with sites. Action Oliver.

 

> Validate system using some volunteer sites

KIT volunteered
Sam to put the proposal to GridPP storage folks.
Enrico will test the script against storm

 

> Code Review

A code review of the probe is required because the proxy used by SAM is a powerful one. To be completed before any production use.

 

> What to do if PUT fails

Currently the probe's first action is to PUT a file. If this fails, the testing is abandoned and all other test results are UNKNOWN. This behaviour should change so sites whose data is readable over HTTP are visible. Thus if PUT fails the probe should first check for the existence of a standard file in the root of the VO space called 

"<VO_IN_CAPS>_HTTPTFtest.txt"

and if that file is absent attempt to put a test file via SRM. 

Note - subsequent discussion indicated that doing a "stat" and "ls" on the top level directory may also be a good solution.

 

> Review of outstanding issues in the "How to support HTTP for WLCG" document

Andrea Ceccanti noted that as StoRM does not yet support 3rd party copy with HTTP it can be discussed in a future TF meeting if the experiments wish.

 

> Transfer monitoring - feedback from meeting 15th July

Andrea Ceccanti reported substantial agreement from the StoRM perspective on the proposed monitoring data path: udp→collector→dashboard.