EGEE-SA1-SWE Meeting
→
Europe/Zurich
Sky VRVS Virtual Room (http://www.vrvs.org)
Sky VRVS Virtual Room (http://www.vrvs.org)
Pacheco, Andreu
Description
Present: Andreu Pacheco (PIC), Manuel Sanchez (USC), Carlos Fernandez (CESGA), Javier Lopez (CESGA), Javi Fontan (CESGA), Fco. Bernabe (CESGA), Miguel cardenas (CIEMAT), Pablo Rey (CESGA), Ricardo Graciani (UB), Kai Neuffer (INTA), Eduardo Huedo (INTA), Farida Fassi (IFIC), Guillermo Losilla (BIFI-Zaragoza), Alvaro Fernandez (IFIC), Jose Salt (IFIC), Adria Casajus (UB), Carlos Borrego (PIC), Fco Bernabe (CESGA), Marc Rodriguez (PIC), Juan Saborido (USC), Celso Martinez Rivero (IFCA), Mohammed Kaci (IFIC)
Apologies: Gonzalo Merino (PIC)
-
- 12:00 → 12:15
-
12:05
→
12:15
Operational status from sites 10m VRVS
VRVS
* From February 1st 2005 all sites must report by mail their operational status each friday if they want Gonzalo Merino to report with accuracy their status to EGEE Operations Meeting * Since the people from CERN is asking the ROCs to write the monday's 11 report following a sort of template, I think it will be useful that the SWE Site Reports that site managers are sending on friday to this list, try to follow a similar template as well. I suggest then that these reports are produced following the template available at: http://services2.pic.org.es/devel/download/txt/SR-template.txt * Reports should be sent to: egee.swe.roc.contact@pic.esSpeaker: Merino, Gonzalo (PIC)-
CESGA-EGEE 15m* Javi Fontan reporting. Weekly Operations Status Report: CESGA-EGEE Reporting Period: 7/3/05 to 13/3/05 Major Operational Issues Encountered During the Reporting Period ================================================================ MON box inestable. It freezed on wedneday because some rgma reinizialization scripts where buggy. Patch from LCG-rollout was applied. MON also freezed this saturday, we are now studying the causes. Upgrade to Scientific Linux 3 (or equivalent) ============================================= By when will all service nodes in the Site be upgraded? Already done By when will the WNs in the Site be upgraded? Already doneSpeaker: Fontan, Javier (CESGA)
-
CIEMAT-LCG2 15m* Report sent by mail. Availability of Core Services Hosted in the Site ================================================== There are no core services running at CIEMAT at this moment Non-Functional Time ==================== None Scheduled Down Time =================== * Scheduled Maintenance During the Reporting Period None * Scheduled Maintenance During the Next Reporting Period None Major Operational Issues Encountered During the Reporting Period ================================================================ None Upgrade to Scientific Linux 3 (or equivalent) ============================================= By when will all service nodes in the Site be upgraded? End of March By when will the WNs in the Site be upgraded? End of MarchSpeaker: Calonge, Javier (CIEMAT)
-
CNB-LCG2 15m* Report sent by mail. Weekly Operations Status Report: [ CNB ] Reporting Period: [ 6/03/05 ] to [ 11/03/05 ] Availability of Core Services Hosted in the Site ================================================== none Non-Functional Time ==================== - Total days unavailable:1 Scheduled Down Time =================== Major Operational Issues Encountered During the Reporting Period ================================================================ none Upgrade to Scientific Linux 3 (or equivalent) ============================================= By when will all service nodes in the Site be upgraded? Pending By when will the WNs in the Site be upgraded? Pending Points to Raise at the Operations Meeting ========================================= noneSpeaker: Merino, Angel (CNB)
-
IFCA-LCG2 15m* Reported by Celso Martinez, Dani Cano at CMS week. * No major issues.Speaker: Cano, Dani (IFCA)
-
IFIC-LCG2 15m* Reported by mail. Availability of Core Services Hosted in the Site RB: lcg2rb.ific.uv.es (public use) [OK] RB: lcg2rb2.ific.uv.es (ATLAS production) [OK] BDII: lcg2bdii.ific.uv.es (test zone) [OK] BDII: isn04.ific.iv.es (ATLAS production) [OK] PROX: lcg2proxy.ific.uv.es [OK] Non-Functional Time None Scheduled Down Time * Scheduled Maintenance During the Reporting Period None * Scheduled Maintenance During the Next Reporting Period None Major Operational Issues Encountered During the Reporting Period None Upgrade to Scientific Linux 3 (or equivalent) By when will all service nodes in the site be upgraded? End of March By when will the WNs in the site be upgraded? End of March Points to Raise at the Operations Meeting NoneSpeaker: Sanchez, Javier (IFIC)
-
INTA-CAB 15m* Kai Neuffer reporting. * Problems with APEL after upgrade. Reporting to support channel. * A comment was raised on the request from some VOs to install software on shared directories. Ricardo Graciani (LHCb) comented that it is not scalable that all jobs install all needed software prior to run. Conclusion: Up to each site to decide. Availability of Core Services Hosted in the Site ================================================ No core services hosted in this site. Non-Functional Time =================== 0 days unavailable (apart from scheduled maintenance). Scheduled Down Time =================== * Scheduled Maintenance During the Reporting Period 1 day (LCG-2_3_1 upgrade and validation). * Scheduled Maintenance During the Next Reporting Period 0 days. Major Operational Issues Encountered During the Reporting Period ================================================================ Already reported bugs in YAIM for LCG-2_3_0 remains unsolved for LCG-2_3_1. Upgrade to Scientific Linux 3 (or equivalent) ============================================= Already done. Points to Raise at the Operations Meeting ========================================= Convenience of defining a shared area between WN for experiment software installation.Speaker: Huedo, Eduardo (INTA)
-
LIP-LCG2 15m* Reported by mail. Availability of Core Services Hosted in the Site ================================================== RB: rb02.lip.pt [OK] LCG-BDII: ii02.lip.pt [OK] RLS: se01.lip.pt [OK] Non-Functional Time ==================== [OK] all time Scheduled Down Time =================== No Scheduled Down Time Major Operational Issues Encountered During the Reporting Period ================================================================ LIP-LCG2 has upgraded from LCG2.3.0 to LCG2.3.1 11-03 morning no disruption was caused. No major issues encoutered Upgrade to Scientific Linux 3 (or equivalent) ============================================= All nodes in SLC303 Points to Raise at the Operations Meeting ========================================= Netowrk Monitoring and measurement is being tested at LIP. The Central Monitoring Host (CMH) is being configured a client will be deployed with the sensors (ce01) for a first test. After this more news will be sent to the SWE for deployment in the other sitesSpeaker: David, Mario (LIP)
-
IFAE 15m* Carlos Borrego reporting. Availability of Core Services Hosted in the Site ================================================== RB, BDII: lcgce02.ifae.es,lcgbdii02.ifae.es [OK] Non-Functional Time ==================== - Total days unavailable: 0 days Scheduled Down Time =================== Major Operational Issues Encountered During the Reporting Period ================================================================ none Upgrade to Scientific Linux 3 (or equivalent) ============================================= By when will all service nodes in the Site be upgraded? March By when will the WNs in the Site be upgraded? March Points to Raise at the Operations Meeting ========================================= noneSpeaker: Borrego, Carlos (PIC)
-
PIC 15m* Carlos Borrego reporting. Weekly Operations Status Report: [ PIC ] Reporting Period: [ 6/03/05 ] to [ 11/03/05 ] Availability of Core Services Hosted in the Site ================================================== ce01.ifae.es bdii01.pic.es [OK] Non-Functional Time ==================== - Total days unavailable: 0 Scheduled Down Time =================== Major Operational Issues Encountered During the Reporting Period ================================================================ none Upgrade to Scientific Linux 3 (or equivalent) ============================================= By when will all service nodes in the Site be upgraded? Done By when will the WNs in the Site be upgraded? Done Points to Raise at the Operations Meeting ========================================= noneSpeaker: Borrego, Carlos (PIC)
-
UAM-LCG2 15m* Mising report.Speaker: Pardo Navarro, Juan Jose (UAM)
-
UB-LCG2 15m* Ricardo Graciani reporting. * Firewall problems during the last days. Weekly Operations Status Report: UB Reporting Period: 07/02/2005 to 13/03/2005 Availability of Core Services Hosted in the Site ================================================ Non Core Service in the Site Non-Functional Time ==================== Since Friday remote connection to Site was cut. Seems solved now. Scheduled Down Time =================== None Major Operational Issues Encountered During the Reporting Period ================================================================ See above. Upgrade to Scientific Linux 3 (or equivalent) ============================================= By when will all service nodes in the Site be upgraded? March By when will the WNs in the Site be upgraded? March Points to Raise at the Operations Meeting ========================================= NoneSpeaker: Graciani, Ricardo (UB)
-
UPV-GRyCAP 15m* Reported by mail. Availability of Core Services Hosted in the Site ================================================== RB, BDII: apis.dsic.upv.es [OK] Non-Functional Time ==================== - Total days unavailable: 0.5 days (11/04/2005 Morning) Number of files in quota exceeded of user dteam. Scheduled Down Time =================== * Scheduled Maintenance During the Reporting Period - Total days unavailable 0. * Scheduled Maintenance During the Next Reporting Period - Estimated number of days for which the site will be unavailable 0 days Major Operational Issues Encountered During the Reporting Period ================================================================ none Upgrade to Scientific Linux 3 (or equivalent) ============================================= By when will all service nodes in the Site be upgraded? Already done (SL 3.0.4 & FC2) By when will the WNs in the Site be upgraded? Already done (FC2)Speaker: Caballer, Miguel (UPV-GRyCAP)
-
USC-LCG2 15m* Reported by Manuel Sanchez Weekly Operations Status Report: USC Reporting Period: 07/03/2005 to 14/03/2005 Availability of Core Services Hosted in the Site ================================================ None Deployed Non-Functional Time =================== None Scheduled Down Time =================== None Major Operational Issues Encountered During the Reporting Period ================================================================ Upgraded to LCG 2.3.1 Older gatekeeper logs have been recovered and reprocessed in APEL Upgrade to Scientific Linux 3 (or equivalent) ============================================= By when will all service nodes in the Site be upgraded? March By when will the WNs in the Site be upgraded? March Points to Raise at the Operations Meeting ========================================= NoneSpeaker: Saborido, Juan Jose (USC)
-
Savannah EGEE SA1 SWE open tickets for sites 15mSpeaker: David, Mario (LIP)
-
-
12:10
→
12:15
APEL Status Report 5m VRVS
VRVS
== Period == From March 5 to March 11 == Summary == March 5, 6 and 7 Flexible Archiver has been down. This avoids publishing accounting data during these days even if Apel is working fine in some sites. During this week, USC have found a possible bug in Apel. Instead of reprocess only the JoinProcessor, they make a complete reprocess (PBSLogProcessor, GkLogProcessor and JoinProcessor). In any case, they must republish all their data: reprocessed data and old data that was not been reprocessed but are in the MON database (by the JoinProcessor reprocess). But only reprocessed data (data stored in the log files) was published. They still miss old data that is in the database but not anymore in the log files. We have reported this error to the Apel team and they will investigate further. To add missing records, after a GOC failure, doing a reprocess in the JoinProcessor (in MON) is enough - of course, if all log files were correctly processed previously. By other hand, we have found some bugs in the new version of Apel. This bugs were reported to the Apel team and they have fixed them. Now, the new version seem to work properly. We hope that during the next weeks this new version will be public available. == Status of each SWE site == Site: CESGA Solved Problems of Last Report: Yes Missing Accounting Entries: None Publishing User Name Information: Yes Actions: None Other Comments: None Site: CIEMAT Solved Problems of Last Report: Yes Missing Accounting Entries: 5, 6, 7 Publishing User Name Information: Yes Actions: JoinProcessor Reprocess Other Comments: None 14/11/04 Site: IFCA Solved Problems of Last Report: NO Missing Accounting Entries: From 15/01/05 Publishing User Name Information: Yes Actions: Solved Problems of Last Report. Reprocess Other Comments: None Site: IFIC Solved Problems of Last Report: No Missing Accounting Entries: 5, 6, 7 Publishing User Name Information: Yes Actions: Solved Problems of Last Report. JoinProcessor Reprocess Other Comments: None Site: INTA Solved Problems of Last Report: NO Missing Accounting Entries: 5, 6, 7, 8 Publishing User Name Information: Yes Actions: Solved Problems of Last Report. JoinProcessor Reprocess Other Comments: None Site: LIP Solved Problems of Last Report: Yes Missing Accounting Entries: 5, 6, 7 Publishing User Name Information: Yes Actions: JoinProcessor Reprocess Other Comments: None Site: pic Solved Problems of Last Report: Yes Missing Accounting Entries: None Publishing User Name Information: Yes Actions: None Other Comments: New site. The first time they have published data was March 11. They have data since 08/02/05 Site: PIC-LCG2 Solved Problems of Last Report: Since 02/03/05 they are publishing user information, but they have not publish the lost user information. They don't reprocess. Missing Accounting Entries: 5, 6, 7 Publishing User Name Information: Yes Actions: Solved Problems of Last Report. JoinProcessor Reprocess Other Comments: None Site: UAM Solved Problems of Last Report: NO Missing Accounting Entries: 5, 6, 7 Publishing User Name Information: Yes Actions: Solved Problems of Last Report. JoinProcessor Reprocess Other Comments: None Site: UB Solved Problems of Last Report: They don't reprocess due a memory problem in R-GMA. Missing Accounting Entries: 5, 6, 7 Publishing User Name Information: Yes Actions: JoinProcessor Reprocess Other Comments: The memory problems in R-GMA can be solved following the instructions sended by the Apel team. Site: UPV Solved Problems of Last Report: Yes Missing Accounting Entries: 5, 6, 7 Publishing User Name Information: Yes Actions: JoinProcessor Reprocess Other Comments: None Site: USC Solved Problems of Last Report: They have reprocessed but they have not published all the lost data due to a possible bug of Apel. Missing Accounting Entries: None Publishing User Name Information: Yes Actions: None Other Comments: None == Accounting Support == Accounting Web Page: http://www.egee.cesga.es/EGEE-SA1-SWE/accounting/ Accounting FAQ: http://www.egee.cesga.es/EGEE-SA1-SWE/accounting/faq.html Accounting Statistics: http://www.egee.cesga.es/EGEE-SA1-SWE/accounting/reports/ CPU Normalization: http://www.egee.cesga.es/EGEE-SA1-SWE/accounting/normalization.html Accounting Guides: http://www.egee.cesga.es/EGEE-SA1-SWE/accounting/guides/ Contact e-mail: egee-admin@cesga.esSpeaker: Rey Mayo, Pablo (CESGA) -
12:15
→
12:30
Action lists 15m VRVS
VRVS
Speaker: Pacheco, Andreu (IFAE)-
5.1.17.4 Andreu Pacheco to update the SWE Execution Plan (PENDING) 15m* The execution plan will be reviewed in March. This action is postponed until then.Speaker: Pacheco, Andres (PIC)
-
5.1.17.5 All sites except LIP and IFIC to support SWETEST Virtual Organization (CLOSED) 15m* Closed action. * Current sites suporting SWETEST are: CESGA, PIC, IFIC, IFAE, UPV, LIP, INTASpeaker: Pacheco, Andres (IFAE)
-
5.1.17.7 CIEMAT, CNB, IFIC, IFAE, UAM and USC to upgrade worker nodes to Scientific Linux 3.0.2 (PENDING) 15mSpeaker: Merino, Gonzalo (PIC)
-
5.1.17.8 CNB to upgrade Apel version 3.4.39-1 (PENDING) 15mSpeaker: Lopez Cacheiro, Javier (CESGA)
-
5.2.21.1 Create new mailing list for SWE site admins (CLOSED) 15m* Requested on Fri 4 Mar 2005 * List name is project-eu-egee-sa1-swe-admins@cern.ch * Added all site contacts from SWE as they appear in the GOC database. * New users added to swetest VO should be notified in this list.Speaker: Pacheco, Andreu (PIC)
-
5.3.7.1 Miguel Cardenas to check if forensic analysis after intrusion is mandatory and who can do it (NEW) 15mAbout the question, is it compulsory to make fosensic analysis in the EGEE? The answer at this moment is not, following the "Grid Security Incident Handling and Response Guide" from the "Operational Security Coordination Team" (OSCT), and the comment of David Kelsey and Ian Nelson. The reason for avoid establish the analysis mandatory are that, perhaps, making it compulsory could even discourage small sites. However, from the OSCT, it should strongly encourage to people to perform analysis forensic. In other hand, there is not a deontological code for the use of data received in this issue. So, Elio Perez and me have written a draft of this code. We add this draft to the message. Deontological Code for the use of image of compromised computers: - Collected data will not be modified by the analysis team. Any work will be developed using copies. - Collected data will be used only by the analysis team. Nobody else should access to this information. - The analysis team will guarantee the confidentality of any information stored on disks. - Collected data should be used only for research porpouses. Once the investigation has concluded, any information keeped by the analysis team should be destroyed. - In case the analysis becomes useful for research or educational porpouses, only technical information should be used, avoiding any usage of user data. We hear any suggestion about this draft. * We close this ticket and we open new ones: legal implications of forensic analysis, the formal approval of the deontologic code and the steps to add Miguel Cardenas as SWE Security officerSpeaker: Cardenas, Miguel (CIEMAT)
-
5.3.7.2 Farida Fassi and Mohammed Kaci to setup the initial user support web site (OPEN) 15mHello everybody: As we promised in the last our weekly meeting the standalone SWE Ticketing System is ready to be tested. The system can be reached using the following URL: http://cg1.ific.uv.es/tstdesk/index.php. There I have included the list of both the supporter names and the supporter groups names that we sent to me 6 weeks ago, as well as all the changes needed to integrated it to the GGUS one. There is another test ticketing system that is been using for the integration purpose, and it has the same adaptation structure that the one which I am sending to you. To access to the standalone system, one must have an account there, therefore I have created several accounts for all the experts that I have received from you. Below I will list the login/passwor for each one. Once you log into the system you can change your password using the "Edit Profile" option located in the left-hand on the web page. If someone is not appear in the list below, please feel free either to register for an account (using the above URL) or to send to me an e-mail. The system now has only the experts, and to test the whole system in order to provide us a good feedback it is useful to have some normal users. These users will create the ticket, that next will be answerd by the experts. So all the peaple in this mailing list are invited to ask for an account either as an expert or as a normal user. * Next actions. We must test the system prior to make it operative and Mohammed must make operative the ticket import/export to GGUS.Speaker: Fassi, Farida (IFIC)
-
5.3.7.3 Pablo Rey to submit a new functionality request to Appel developpers (CLOSED) 15m* Ricardo Graciani requested a new functionality to be added to a future release of Appel. * The functionality would be to be able to reprocess only data between two dates from the whole repository. * Currently this is possible to do by copying the files corresponding to the dates desired in a new directory and reprocess from there. Easy procedure is required. * The agreement is that Pablo will submit the request to the developpers to be added to their todo list. * DONE *Speaker: Rey Maro, Pablo (CESGA)
-
5.3.7.4 Mario to deploy gmdat monitoring tools in SWE (NEW) 15m* Mario has volunteered to test some Crossgrid monitoring tools to evaluate their deployment in EGEE-SWE. * Netowrk Monitoring and measurement is being tested at LIP. * The Central Monitoring Host (CMH) is being configured a client will be deployed with the sensors (ce01) for a first test. * After this more news will be sent to the SWE for deployment in the other sitesSpeaker: David, Mario (LIP)
-