ROC manager update
*************************
- The next ROC manager's meeting is next Tuesday: http://indico.cern.ch/conferenceDisplay.py?confId=14084
Ops meeting update
************************
There are several new patches being rolled out to the PPS
# gLite 3.1.0 PPS Update08: pre-deployment tests passed and deployed to the remaining PPS sites; The release contains (patch numbers):
* 1233 R3.1 FTS update (glite-data_R_3_1_35_1)
* 1255 JobWrapper tests - new version with no R-GMA dependencies
* 1381 New version of lcg-tags with better error reporting
* 1382 New version of lcg-info with support for VOViews, sites and services
* 1383 lcg-CE for glite 3.1
* 1384 Updated Torque (2.1.9-4) and Maui (3.2.6p19-4)
* 1393 gLite 3.1 TORQUE_utils (slc4/ia32)
* 1394 gLite 3.1 TORQUE_server (slc4/ia32)
* 1413 glite-yaim-core 4.0.1 for the 3.1 repository
* 1415 glite-yaim-clients 4.0.1 for the 3.1 repository
- Adding sites to the central (RAL) R-GMA repository. What process is generally followed and who is the T1 contact?
- (CERN - site CERN-PROD): the command
glite-wms-job-status -all
meant to retrieves all jobs of the user that are in a certain status, is not working at CERN-PROD. As a follow-up of a GGUS ticket (https://gus.fzk.de/ws/ticket_info.php?ticket=27455) the WMS service admins replied opening a bug in Savannah (https://savannah.cern.ch/bugs/?30989 ). Have we seen overload of the LB database due to concurrent use of the command glite-wms-job-status -all?
- ATLAS VO Views problems still persist. The code used to generate a list of problematic sites can be found at afs/cern.ch/user/c/campanas/public/VOVIEW
There is a .txt file with a query you should run on the BDII to gather the relevant info. In addition there is a python script which fetches info from the output of the query and generates and output, where sites marked with ==> are the problematic ones. What is the UK situation?
- Downtime notification lists were discussed. The CIC portal team are also working with GOCDB to define sub-nodes that affect only specific regions/groups.
- (CE) Pointed out that 1 month to move to SL4 from SL3 when services available should allow postponement if problems are found with SL4 deployments. SA3 agree.
- (UKI) Our observations of cases where the gstat page shows a site in maintenance while the GOCDB does not have any listed downtime (for one example see the RAL Tier-1 case in GGUS ticket 28520).
- (UKI) Several sites have seen recent SAM sft-job failures relating to downloading from RBs. Errors like "Cannot download X from gsiftp://rb115.cern.ch" where X is usually .BrokerInfo or the tarball of SAM tests are being seen. Is this evident in other EGEE regions and what is behind it?
It seems it is evident in several other regions and the CERN ROC will investigate the CERN RB (ticket should be raised).
- (IT - CNAF) Sam tests: noted that due to some SAM failures their site was blocked by FCR and it then took many hours of passing tests to get back in even after all tests were fine. They wanted to know if there was a workaround!
- new security test in Validation, what the test does is:
1. reads all the env vars in the WN account where the job runs.
2. for each directory/file specified in any variable, and for each file inside any of those directories:
* the test checks if the file/dir has write privileges for the Other group (the --------X- bit).
* if the file/dir is in $PATH, it returns an ERROR
* if the file/dir is not in $PATH, it returns a WARNING
* if no 'w' privilege was found in any of the files/dirs, the test returns OK.
- Validation instance of SAM portal is available here:
https://lcg-sam-val.cern.ch:8443/sam-val/sam.py
Ticket status
***************
1721 - CloseSE for hepgrid5.ph.liv.ac.uk 07/08/2007 AF
1779 - UKI ROC website should not provide documentation 20/08/2007 JC
1860 - CE failure on epbf005.ph.bham.ac.uk (UKI-SOUTHGRID-BHAM-PPS) 12/09/2007 PG
1982 - [Gridpp #21349] Request access to RGMA Registry 12/10/2007
1987 - t2ce02.physics.ox.ac.uk [4] 12/10/2007 PG
2002 - [Gridpp #21377] CE lcg01 and lcgce02.gridpp.rl.ac.uk : Server certificate possibly not installed 16/10/2007 DR
2087 - lcgrb02.gridpp.rl.ac.uk broken? 30/10/2007 DR
2089 - Some problems trying to access data in UK 30/10/2007 GS
2090 - Problem accessing ATLAS data in Edinburgh SE (site: ScotGRID-Edinburgh) 30/10/2007 GS
2123 - Missing gcc32 compiler at RAL-LCG2 05/11/2007 DR