VOM(R)S Working Group

History

This Working Group (WG) is a continuation of the LCG User Registration Task Force. It was approved by the GDB on Wed, 04 Apr 2007. The requirement for this on-going coordination body was presented at the February 2007 GDB (slide 9). Contact: project-voms-wg@cernNOSPAMPLEASE.ch

A status report on the WG activities was regularly given to the GDB (example: the June 2007 GDB. The WG wrap-up was announced at the July 2009 GDB). The decision was taken by EGEE SA1 and wLCG management.

Mandate

The VOM(R)S Working Group will be responsibile for:
  • Bringing together the VOMS/VOMS-ADMIN/VOMRS developers to ensure the products evolve in a coordinated fashion matching users', VO/ROC Managers', services' and security requirements.
  • Maintaining the quality of CERN HR database link, adapt to the new Oracle versions and report problems with other paraphernalia (e.g. java, tomcat etc) to the relevant gLite deployment bodies for performance and quality.
  • Ensuring thorough product testing and a forum to discuss issues before adoption in production.
  • Agreeing on the priorities for bug fixing and information dissemination to users at large.

Meetings

Dates and Agendas. Input material and minutes are attached to the agendas.

Generic Attributes (GAs)

2007-05-23 notes

Tanya, Lanxin, Remi and Maria made a check-point on the phone. Our understanding is recorded here (to be checked at the 2007-05-30 EMT):
  1. Question: Do we have the necessary and sufficent glite patches to install voms with oci connections and GAs? Answer: Not yet. Joachim is struggling to incorporate OCI connection parametres in the glite-voms-server-config.py script that will be copied correctly in the voms-admin config. files. Update info at the 2007-05-30 EMT. Complete description in page VomsOracleImprove. Implementation method in bug #19654.
  2. Question: Do we have a test environment identical to the production environment where to install voms_2.0.? and vomrs 1.3.1 ? Answer: Yes, tomcat, java and OS are identical to the production ones. The tests are being run by Lanxin over a voms-admin-server-2.0.3-2. Tanya will say in which vomrs version the synchronisation bug will be fixed. Meanwhile, the vomrs server process will have to be restarted.
  3. Question: Can we check again the list of system & software requirements, e.g.:
     
     . What OS
     . What version of tomcat
     . What version of java
     . What version of oracle
    Answer: SLC3, java-1.5, tomcat5-5.0.28-11_EGEE, oracle-instantclient-basic-10.2.0.1-1 are the ones we use in production and for the tests. We should ask at the 2007-05-30 EMT whether and when we need to change any of these. vomrs doesn't currently work with tomcat5-5.5 so we need advance notice if we need to move to this. Joachim reminded that in the OCI bug 19564 it is specified that oracle version 10.2.0.3 is required.
  4. Question: Can we get Andrea's advice on the multiple db errors discovered during Lanxin's stress test - this may be taken offline, if too long but we need to agree how to proceed. Anwer: This is being done via Andrea's periodic login on Lanxin's test host. Tanya advised that all voms data be erased and a new sync from vomrs pushes down all entries to start from scratch monitoring the voms-admin logs.
  5. Question: Deployment plan, based on the above. Answer: On Tanya's advice, when the exact working rpms are clear, we should re-install them from scratch on the voms-test host to simulate the upgrade of the production nodes.
  • EMT discussion on 2007-05-30

2007-05-30 notes

None of the vom(r)s developers joined.
    • The answer to Question 1 in the notes above is that Joachim will try to run his script on a clean machine provided by Remi (voms-test.cern.ch). If the problem with the OCI parametres persist he will contact Andrea, who has access to our test nodes. Then, we 'll re-discuss it at the EMT.
    • The answer to Question 3 in the notes above, as far as tomcat version is concerned, is that glite3.1 foresees tomcat5-5.5 for all services. It is possible to prepare a separate branch for as long as we solve the vom(r)s problems and while testing voms-admin behaviour on this new version. Nevertheless, we should plan time now for moving forward as this upgrade is inevitable this coming summer.
    • Joachim's reminder in Question 3 in the notes above, concerning the Oracle client upgrade -a prerequisite to the use of OCI- brings to mind that no testing has been done so far with oracle version 10.2.0.3. Therefore time and effort should be planned for this.
  • EMT discussion on 2007-06-26

2007-06-25 notes

    • Question: Are we still certifying version voms-admin-server-2.0.3-2? vomrs-1.3.1 tests were run over this version. Answer: voms-admin-server-2.0.4-2 is being certified now. The changes introduced in this version only concern the configuration scripts, therefore, in Andrea's opinion, Lanxin doesn't need to re-run the vomrs tests. The certification, from now on, will be done by Joachim, who is currently learning from Maria A.
    • Question: The Oracle client version 10.2.0.[1 or 3] is still obscure. Answer: the version 10.2.0.1 should be used, as made available from Oracle. Andrea, Joachim and Remi to compare the exact packaging of the 10.2.0.1 versions used on their installations.
    • Question: oci parametres in the glite wrapper still don't work. It seems that the EMT decided to put this aside in order to complete the certification of the mysql port of voms-admin needed for VDT. How do we accomodate the pressing requests from the VOs that date for more than a year? Answer: Joachim thinks that the oci problem is solved now. He will certify and tell us a.s.a.p. Some recent EMT decided to certify the mysql version with higher priority as requested by VDT. Nevertheless, VDT is not willing to take a version that hasn't been used in production. Therefore, voms-admin-server-1.2.19 is out of the question because, it is, indeed, the one we are using in production today, nevertheless, the sooner we certify the required version 2.0.4-2, the faster it will be released, hence instailled in production.
    • Question: vomrs doesn't currently work with tomcat5-5.5. Has voms-admin been actually tested with this tomcat version? Answer: This was Maria D.'s misunderstanding. Input from Tanya "vomrs works with tomcat5-5, but it requires to change vomrs url (from vo/ to vo_ because tomcat somehow ignores the context path specified in xml configuration. I think that the problem is in tomcat configuration and am wondering if other see/solve this problem." Andrea will test if there is any such URL change is needed for voms-admin too.
    • Question: People involved in the Job Priorities WG and Data Management, expressed worries about the impact of VOMS GAs on their applications. Our impression from the VOs and the developers is that GAs will only be used by VO application software. Can this be confirmed by testing? Answer: Vincenzo confirmed that voms core is backwards compatible, no change in the FQANs, no worries for any middleware application. If people want to use GAs on the production CERN VOMS servers for the 'test' VOs, we have to upgrade voms core server to version 1.7.16-2.

2007-08-22 notes (by Cl. Grandi)


VOMS-admin
==========
# certification progress
Dimitar found bugs in the certification of VOMS-Admin 2.0.4.
AndreaC already fixed some of them and will look to the new ones submitted today in a couple of days. More tests will be done in the next days.
Oliver proposes to reject the patch and release a new one on SL4. OK for AndreaC.
MariaD says it's a long time experiments are waiting for Generic Attributes so may take too long to wait for SL4, also because it implies changes in Tomcat (5.5) and Java (5).
AndreaC says that VOMS-Admin 1.2.19 together with the new version of VOMS core support GA.
Maria says experiments didn't test it.
Claudio says CERN is discontinuing support for SL3 in 2 months so releasing an SL3 version now may be a waste of time.
AndreaC says that he didn't see any problems with Tomcat 5.5 and Java 5.
Tanya asks about the matrix of certification.
Oliver says the current cert is with SL3, Tomcat 5, Java5 on both Oracle and MySQL. The proposal is to certify the next patch on SL4 Tomcat 5.5, Java 5 again both Oracle and MySQL.
In conclusion it was decided to go to that combination. The next steps are:
- Maria will ask for SL4 hardware.
- Dimitar will get a new SL4 patch from AndreaC (using oracle instant client 10.2.0.3)
- The OCI driver with the changes in the configuration script by Joachim
Dimitar says that he in principle needs only one day for testing the last things left.
Oliver says that we need to be more conservative and the certified version will probably come in a few weeks.
Oliver asks Nick if it is reasonable to bypass the PPS. Nick agrees.
Ma Lanxin reports problems in the upgrade to Tomcat 5.5.
Oliver says Steve has a recipe but apparently this didn't work for her. She will investigate further.
Once the VOMS-Admin certification will be over Tanya will provide VOMRS 1.3.1 on SL4.
AndreaC says he can give a version to Tanya at the same time he gives it to Dimitar.

# Vulnerability in VOMS-Admin and VOMRS
Maria reports that Romain found a vulnerability for VOMS-Admin and VOMRS that Andrea and Tanya already fixed but still needs to be tested.
Tanya will give Romain the link to her test site and Romain will be testing it.
Alain asks about the time scale for a fix for VOMS-Admin 1.2.19.
AndreaC says it would take about 1 week while it is easy (done already) in 2.0.X.

# VOMS replication
Maria asks Valerio about VOMS-DB replication tests (read-only replica) at CNAF.
AndreaC says the setup is working as expected. Stress tests were interrupted because of holidays and are being resumed now.
Maria asks to proceed soon to the CERN-CNAF stream tests to which CERN agreed in June.

# 28907 - VOMS 1.7.20.1 failing against older VOMS server
Alain says they probably have a workaround but need a proper fix.
Valerio needs a couple more days of investigation because he has problems reproducing the error.

2007-08-22 slides

Prepared for the Triumph GDB of 2007-08-31.

2007-09-05 Notes

Based on the minutes by Joachim Flammer. VOM(R)S WG members pesent in this EMT meeting: Dimitar, Joachim, Maria D., Remi, Tanya, Vincenzo. Chairperson: John White.

VOMS admin certification
 --------------------
The voms-admin security patch voms 1.2.19 should be released in one month. This was requested by VDT.
 
Python scripts (glite-voms-server-config.py etc) have been updated by Dimitar, critical issues have been resolved.
Updated VOMS server is supposed to be released on SLC4. Romain has asked for a system to test the fix for the web-ui vulnerability. 
Romain claims not having received a system, Vincenzo thinks this should be available. Vincenzo will crosscheck.

Dimitar will finish certification of voms-admin this week. voms-core still needs to be certified on SLC4 - Dimitar will also do this from his home institute in Sofia.
Metapackage still needs to be setup and voms needs to be recertified after the log4j change. 

Log4j 1.2.14 to be set in ETICS for gLite by Joachim. Joachim will provide a list of packages where the version number needs to be updated
for this change. 

2007-09-19 Notes

Submitted by Joni in email: "The trustmanager 1.8.10.1 and voms admin are build using log4j 1.2.14. and at least trustmanager still with the old 1.22 version of bouncycastle. The bouncycastle upgrade is still in the works. I would recommend the newest 1.8.10.1 trustmanager."

Other updates also circulated in email: voms-admin-server-2.0.6 tag is imminent. vomrs will be upgraded on today's production version for now. Plan in VomrsUpdateLog#19th_September_2007

2007-12-05 Notes

-------- Original Message --------
Subject: request for close follow-up on progress of voms bugs
Date: Wed, 5 Dec 2007 17:32:29 +0100
From: Maria Dimou-Zacharova <Maria.Dimou@cern.ch>
To: project-eu-egee-middleware-emt (EGEE Middleware Engineering Management Team) <project-eu-egee-middleware-emt@cern.ch>
CC: project-voms-wg <project-voms-wg@cern.ch>,   <lcg-service-coordination-meeting@cern.ch>

We are going ahead with the voms(r)s upgrade to voms 2.0.8 and vomrs 
1.3.1e on new hardware on Monday Dec 10th 8-11hrs UTC to ensure we 'll 
have enough time to fix problems before the Xmas shutdown.

This is the list of open voms bugs that need to be fixed, tested, 
certified and moved to production very quickly. This list doesn't 
contain bugs related to voms-admin-1.2.x, VDT requests, the MySQL
port and/or other OS flavours to the ones we run.

Until these bugs are properly fixed, a lot of manual work-arounds are 
being applied that make the service seriously fragile.

This list is not sorted but it roughly corresponds to 'most-recent-first'.
https://savannah.cern.ch/bugs/?31787 https://savannah.cern.ch/bugs/?31790 https://savannah.cern.ch/bugs/?31791 https://savannah.cern.ch/bugs/?22973 https://savannah.cern.ch/bugs/?19770 https://savannah.cern.ch/bugs/?31488 https://savannah.cern.ch/bugs/?30712 https://savannah.cern.ch/bugs/?29656 https://savannah.cern.ch/bugs/?31832 https://savannah.cern.ch/bugs/?31800 https://savannah.cern.ch/bugs/?31702 https://savannah.cern.ch/bugs/?31667 https://savannah.cern.ch/bugs/?31476 https://savannah.cern.ch/bugs/?31068 https://savannah.cern.ch/bugs/?30728 https://savannah.cern.ch/bugs/?30726 https://savannah.cern.ch/bugs/?28493 https://savannah.cern.ch/bugs/?22973 https://savannah.cern.ch/bugs/?26974 https://savannah.cern.ch/bugs/?23282 https://savannah.cern.ch/bugs/?20789 https://savannah.cern.ch/bugs/?20607 https://savannah.cern.ch/bugs/?17247 https://savannah.cern.ch/bugs/?15794

It is worrying that most of these bugs were opened in the last month together with 11 more bugs that are solved by now, which are included here to demonstrate the intense bug detecting, reporting and fixing activity going on. These are the following:

https://savannah.cern.ch/bugs/?29567 https://savannah.cern.ch/bugs/?29566 https://savannah.cern.ch/bugs/?29440 https://savannah.cern.ch/bugs/?29439 https://savannah.cern.ch/bugs/?28997 https://savannah.cern.ch/bugs/?28961 https://savannah.cern.ch/bugs/?28959 https://savannah.cern.ch/bugs/?28909 https://savannah.cern.ch/bugs/?28893 https://savannah.cern.ch/bugs/?28892 https://savannah.cern.ch/bugs/?28888

Given the size of this list we are kindly asking for a generous voms slot at the Wednesday Dec 12th EMT, when echos from our upgrade can also reported.

Thank you maria --

2007-12-12 Notes

-------- Original Message --------
Subject: voms issues for the EMT of 12/12/2007
Date: Wed, 12 Dec 2007 16:23:18 +0100
From: Maria Dimou-Zacharova <Maria.Dimou@cern.ch>
To: project-eu-egee-middleware-emt (EGEE Middleware Engineering Management Team) <project-eu-egee-middleware-emt@cern.ch>
CC: project-voms-wg <project-voms-wg@cern.ch>

Thank you for accepting our request of 05/12/2007 to include a topic on
voms to today's EMT.
The https://savannah.cern.ch/patch/?1582 is present, which is good.

Nevertheless, we see none of the 24 bugs we brought to your attention with this note https://twiki.cern.ch/twiki/bin/view/LCG/VomsWG#2007_12_05_Notes in the tracked issues of today's agenda http://indico.cern.ch/conferenceDisplay.py?confId=24869

In the meantime a 25th, critical bug was added, so we are formally asking the EMT chair (Oliver?) to include this list in the tracked issues and follow-up progress with the certifier and all intermediate steps to production in every Wednesday's EMT, until they are all turned into green:

https://savannah.cern.ch/bugs/?32136 (top priority) https://savannah.cern.ch/bugs/?31787 https://savannah.cern.ch/bugs/?31790 https://savannah.cern.ch/bugs/?31791 https://savannah.cern.ch/bugs/?22973 https://savannah.cern.ch/bugs/?19770 https://savannah.cern.ch/bugs/?31488 https://savannah.cern.ch/bugs/?30712 https://savannah.cern.ch/bugs/?29656 https://savannah.cern.ch/bugs/?31832 https://savannah.cern.ch/bugs/?31800 https://savannah.cern.ch/bugs/?31702 https://savannah.cern.ch/bugs/?31667 https://savannah.cern.ch/bugs/?31476 https://savannah.cern.ch/bugs/?31068 https://savannah.cern.ch/bugs/?30728 https://savannah.cern.ch/bugs/?30726 https://savannah.cern.ch/bugs/?28493 https://savannah.cern.ch/bugs/?22973 https://savannah.cern.ch/bugs/?26974 https://savannah.cern.ch/bugs/?23282 https://savannah.cern.ch/bugs/?20789 https://savannah.cern.ch/bugs/?20607 https://savannah.cern.ch/bugs/?17247 https://savannah.cern.ch/bugs/?15794

Many thanks in advance maria

2007-12-20 Notes

Dimitar's notes used for the 2007-12-10 upgrade are here: https://twiki.cern.ch/twiki/bin/view/EGEE/Glite31VOMS After negotiation with the EMT, important bugs were included in the Tracked issues and are being discussed every Wednesday at the EMT. voms-admin-2.0.10 and voms-1.8.1 will be certified by the end of January 2008. Pre-production should be equipped with Oracle-based VOs, in order to do more testing before us upgrading in production. This was decided at the 2007-12-19 EMT (Agenda and links to bugs discussed).

2008-03-05 GDB Notes

A suggestion for planning a simplified voms service set-up at CERN was presented to the March GDB. These slides are the beginning of more required discussions:
  1. between the LCG and Fermilab/OSG management on VOMRS future.
  2. within the VOM(R)S WG for development planning without production service disturbance.
C.Grandi reported at the GDB that the voms developers are willing to undertake voms-admin/ORGDB interface coding when done with all JSPG-required features implementation. This is expected in voms-admin-2.5 late summer 2008. LHCb said that any service simplification is welcome. It had to be noted that this migratior requires a lot of work from VOM(R)S WG in order not to disturb the service and train the VO Admins and users. Nothing can be expected to change in 2008 but the plan should be worked out in its details, provided this will be the management's decision when item 1. (above) is done.

VOMS Replication

Most recent first: This meeting involved VOMS developers, database experts and DBAs and CERN VOM(R)S service managers and testers.

Actions: The Oracle CERN-CNAF tests were successful. Thanks to Lanxin Ma for running the vomrs test suite to generate and alter test data on the master, to Eva Dafonte/Barbara Martelli for setting up the Oracle streams, to Andrea Ceccanti for setting up the CNAF test voms server.

Now the questions are (Please report your conclusion in email. This twiki will be updated by M.Dimou):

  1. Will BNL wait for the https://savannah.cern.ch/bugs/?32473 fix and resume their home-made replication solution or do they wish to pursue a proper Oracle voms replica? The latter is the developers' preference but M.Ernst and D.Duelmann have to sort out the licencing issues, if any. If BNL wants a replica (M.Ernst to answer this) at their premises, they have to check their Oracle licensing issues. Alternatively, voms can be co-hosted on the FTS servers, if the CPU load is still minimal (M.Anjo to report) and if FTS service managers allow this (M.Dimou to ask G.Mccance).
  2. Is CNAF interested to host a replica? Is their licensing in order?
  3. The Heath-e-child VO manager K. Skaburskas reported a great need of that community for voms db replication for the mysql port. Question to the voms developers and Dimitar: Can this be tested? If yes, who will do it? Lanxin can no more help with this. I understand that no other test suite exists which massively creates/deletes/modifies db entries. L.Ma reported on 2008-02-20 that she did VOMS mysql replication for Euchinagrid and it works.
  4. The VO Managers of the 10 VOs hosted on the CERN vom(r)s servers should come forward if they wish replication, when and where. Be aware of possible licensing issues and the usual slowness of sites to adopt the new VO configuration data that includes the replicas, hence, if you need a replica, justify it and act now by suggesting replica sites.

Answer by J.Hover to question 1 on 2008-02-19:

I have already worked around the bug by switching over to the underlying library used in Andrea Ceccanti's voms-admin.py client. 
So for the moment our home-made solution is working again.
For my part, I agree that leaving it as a custom, home-made solution isn't optimal. I'd prefer that my VOMSAdmin synchronization code 
become part of the official gLite VOMSAdmin client. That way it would be more than just BNL keeping an eye on it, 
and it would automatically be tested against newer versions of VOMS.
Michael would have to comment on whether BNL would prefer to move to an Oracle-based VOMS system. 
If we were to do that, then the database replication method would be better.

Comment by K.Skaburskas to question 3 on 2008-02-20:

We would like to rely on gLite certified solutions. An ideal approach could be briefly as follows:
- one rw master and a number of ro replicas
- master and replicas run "VOMS replication" daemons
- replica instances first contact master and subscribe for updates
- any changes on master are pushed to subscribed replicas

This solution could be made DB backends independent and would require minimal configurations on master/replicas.

If to go for pure DB level replication then each infra having its own VO(s) and VOMS must go through setting up the proper replication themselves. 
In our case (MySQL) the approach is well known and doable. However, controlling such a setup and adding replicas might become cumbersome. 
So, I think, that the first approach is more friendly w.r.t. infrastructure deployment/operation. 

Comments by M.DImou to question 4 on 2008-02-21;

We already have 4 physical hosts all sharing the same database
(2 'logical' servers, i.e. lcg-voms.cern.ch and voms.cern.ch,
with a master-slave structure for each and automatic fail-over)
for the CERN-based VOs, basically LHC experiments and paraphernalia.
 
Of course, if we have a power-cut or a damage of the Oracle servers,
we have problem but, then, many IT services will be in the same situation.

Between the host-to-host tests at CNAF following the 2007-06-12 meeting and ths date, there have been technical discussions on these occasions:

  • Meeting @ CNAF on 2007-06-12 Actions:
    • Decide whether the voms replication is needed for:
      • automatic fail-over or
      • load-balancing purposes. (VOMS developers on current provision in the code before surveying VO Admins).
    • Answer design questions like the handling of proxy renewal.
    • Identify the data that should not be shipped (due to security and/orf personal data privacy). This is a VOMS internal discussion which will result in a proposed set of table names and if necessary a set of filters on replicated tables to avoid security/privacy issues.
    • CNAF and CERN (persons' names needed) to set-up the streams' connections for a test set-up. The time when test should start should be suggested by the VOMS developers.
    • Name the persons at CNAF who will be doing the test. (VOMS developers @ CNAF?).
    • Concentrate on code testing and documentation for the summer months. Confirm that the current VOMS code does handle replicated data (changing behind the back of the application) correctly. Decide when dedicated VOMS test suites should be specified.
    • Set-up a cron job at CERN to generate updated data (M.Alandes agreed to do this). Do we need a VOMS internal discussion on the content / data rates / goals for the test?
    • Check the voms logs to decide what consumption evolution should be expected in the next 6 months. This is important for resource planning purposes. The resource requirements should be defended at the Project Management Board level. (Service manager R.Mollon).
    • Understand the Oracle licencing situation at each site interested to hold a repica VOMS server. Licences are issued per CPU. The CERN db service monitor shows a negligeable CPU consumption by VOMS (less than .5 CPU), allowing its co-existence with other applications on a licenced server. Tier1 sides may obtain an Oracle licence through CERN. A request for a middleware cluster may be submitted, with a solid justification, every 6 months. (To be answered by any site interested to run a replica voms server, e.g. BNL for ATLAS VO db).
D.Duelmlmann's input information on licencing: We do understand the status quo for licenses at all sites: No specific VOMS license have been requested during the poll for request last year. Before going further on this we need to confirm the target number of replica sites and the performance requirements for one replica in term of (fractions) of a Tier 1 server CPU. If that is established (eg for the next 6 month) we need to check if VOMS could be hosted in the existing LFC/FTS Tier 1 DB for which licenses are existing. If also application servers need to be provided by the sites it might be good to specify them. Only after this information is written down together with the reasons for asking for a replicated service and the results from the functional tests and performance tests it makes sense to go into a real resource discussion with sites and the project.

Extract of Dirk Duelmann's workshop notes sent by email on June 19th 2007:
- Concerning VOMS testing with replicated DBs we agreed to start a functional test with the
   VOMS developers/testers  from CNAF and CERN. Initially this test can be done between two
   replicas at CNAF. Once the functionally of VOMS against a r/o replica has been confirmed
   these tests may be extended to WAN replicas if the deployment management sees this
   as priority. At this point we will discuss also the resource and possible license  requirements.
   More detail about the VOMS replication plans can be found on the VOMS group wiki
   https://twiki.cern.ch/twiki/bin/view/LCG/VomsWG#VOMS_Replication 

Extract of Cl. Grandi's EMT notes on August 22nd 2007:

# VOMS replication
Maria asks Valerio about VOMS-DB replication tests (read-only replica) at CNAF.
AndreaC says the setup is working as expected. Stress tests were interrupted because of holidays and are being resumed now.
Maria asks to proceed soon to the CERN-CNAF stream tests to which CERN agreed in June.

status of August 23rd 2007 BNL withdraws the VOMS db replication request. According to Michael Ernst: "John Hover from the ATLAS Computing Facility at BNL has just completed a VOMS API based approach to query the CERN VOMS server for its contents."

Original replication requirement at the WLCG Workshop BOF on Jan 22nd 2007.

CERN VOM(R)S server infrastructure evolution

We lived for one year in production (May 2006-May 2007) with 3 hosts (voms101,2,3) and the front-end for LinuxHA prod-voms (as explained in the drawing of VomsWlcgHa)

voms.cern.ch (==voms101.cern.ch) is identical to the primary VOMS server lcg-voms.cern.ch -same VOs, same data- being used for:

  • gridmap file generation
  • voms-proxy attribution.

lcg-voms.cern.ch (==voms102|3) is responsible for:

  • user registration (vomrs)
  • voms-proxy attribution

Today, in case of voms.cern.ch (==voms101) hardware problem, the contract foresees the beginning of intervention within 4 working hours and the repair competion within 12 working hours. To ensure the gridmap file will be refreshed without interrupt at all sites during the host's downtime, we should pass the DNS alias voms.cern.ch to prod-voms.cern.ch. This is can never be a smooth and transparent intervention.

This is why we are requesting:

  1. one more machine identical to voms101,2,3 and
  2. a front-end identical to prod-voms to set-up a LinuxHA service for voms.cern.ch

Given that the services needing to frequently make the gridmap file are still around and likely to stay, the number of sites increases, the new voms-admin and vomrs with GA support will be more requiring in resources, we believe this request is reasonable for such a visible Grid service. This request was sent to B. Panzer for the CERN IT Service Coordination on May 29th 2007. voms101 presented a disk error around Sep. 5th 2007, so we obtained a host for moving the service transparently. A request to H.Renshall for 3 SLC4 hosts was submitted by M.Dimou on Sep. 10th 2007. They will be the voms102, voms103 and prod-voms with vomrs-1.3.1, voms-admin-server-2.0.5, tomcat5-5.5, oci and Linuxha.

Configuration of the present servers:

processor       : 0 (same for processor 1)
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 3000.219
cache size      : 2048 KB
physical id     : 0
siblings        : 1
runqueue        : 0
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm nx lm
bogomips        : 5989.99
----------------------------------------------------------------------------------------------------
        total:    used:    free:  shared: buffers:  cached:
Mem:  4189224960 2857525248 1331699712        0 235302912 2114273280
Swap: 4293509120 71733248 4221775872
MemTotal:      4091040 kB
MemFree:       1300488 kB
MemShared:           0 kB
Buffers:        229788 kB
Cached:        2041020 kB
SwapCached:      23700 kB
Active:        1777164 kB
ActiveAnon:     246400 kB
ActiveCache:   1530764 kB
Inact_dirty:    577776 kB
Inact_laundry:  147596 kB
Inact_clean:     28480 kB
Inact_target:   506200 kB
HighTotal:     3276224 kB
HighFree:      1000728 kB
LowTotal:       814816 kB
LowFree:        299760 kB
SwapTotal:     4192880 kB
SwapFree:      4122828 kB
CommitLimit:   6238400 kB

VOMRS<--->Voms-Admin convergence (Most recent first)

* GDB 2009-01-14 presentation notes
1. the LCG Management Board (MB) will decide what  it would like to happen amongst:
1.a. faster development for voms-admin to replace vomrs or
1.b. remaining with vomrs.
The MB may then have to negotiate with EGEE and/or OSG to achieve this.

In case of 1.a. time must be found for the voms-admin developer, whose schedule is very much taken by the EGEE III AuthZ framework for the recent months. 
Solution 1.a. is beneficial for all the 100+ registered VOs in the total Grid world. The JSPG-related part (convergence Phase I - see slides) is mandatory anyhow.
After this, maintaining 2 products living one of top of the other (not even independently) will be hard to justify.

In case of 1.b. LCG VOs will have nothing to do but FNAL should be ready to commit for a long life of VOMRS. 
The rest of the Grid world will have to also use VOMRS or live as 'security policy outlaws' until voms-admin contains the  JSPG-related features. 
This is what happens today and it is not nice.
Also, the service maintenance overhead we have now at CERN (multiple hosts and fail-safe technologies) is quite "expensive" in today's experts' limitations.

2. on the 'VOMS db replication' part of the talk Maarten said we don't need to worry about its completion because ATLAS at BNL have a home-made solution that works. 
The concern is who maintains home-made solutions and what other Oracle or MySQL based VOs can do to find the replication scripts out-of-the-box 
and set-up fully operational replicas in a production environment. 

-- Main.dimou - 14 May 2007
Edit | Attach | Watch | Print version | History: r41 < r40 < r39 < r38 < r37 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r41 - 2009-07-08 - MariaDimou
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback