BDII DEPLOYMENT SUMMARY
AP:
For the our BDII in our region we used to have a round robin DNS configuration with two BDIIs. However, we found that one of our servers was too slow to provide reliable performance for IS clients to query. So instead of two single CPU servers, our BDII service now runs on a SMP blade server:
* 2 Xeon 3.0 GHz, 4GB Memory
We are planning to return to a redundant BDII configuration after our next server procurement during the later half of this year.
A Few sites in our region also run their own BDII services, namely: KEK, Tokyo, KISTI, LCG_KNU and PAKGRID. The majority of the remaining sites use ASGC's BDII services.
DECH:
Top Level BDII Situation in ROC DECH
In region Germany Switzerland we have 5 Top Level BDIIs in Place. As ROC we have encouraged our site managers many times to use a different BDII than the CERN one, but we will discuss the regional situation in one of our next regional meetings to improve the situation.
BDIIs at
--------
DESY-HH
grid-bdii0.desy.de
grid-bdii1.desy.de
DESY-ZN
lcg-bdii.ifh.de
FZK
bdii-fzk.gridka.de
Uni Freiburg
bdii.bfg.uni-freiburg.de
Usage:
------
* CSCS, GSI, ITWM, SCAI
lcg-bdii.cern.ch
* MPPMU, Uni Wuppertal
bdii-fzk.gridka.de (lcg-gridka-bdii.fzk.de)
* RWTH-Aachen, Uni Dortmund, Uni Karlsruhe grid-bdii.desy.de
CE:
BDII setup in CE:
In CE we put regional BDII in production on 2.10.2006. All of 24 production sites with ca. 1700 CPUs are using it.
The regional BDII is at bdii.cyf-kr.edu.pl which in fact resolved to TWO IP addresses (using DNS "A" record: 149.156.9.24, 161.53.0.229). So BDII services are hosted in Poland and Croatia. That has the advantage of LOAD BALANCING and FAILOVER.
Load balancing is done since DNS returns IP addresses in different order each query. So each second query goes to the same machine.
Failover is done since LCG GFAL library being given with TWO IP addresses tries to access the firts one and if that fails it transparently tries THE SECOND one. So if the first host in e.g. Poland is not available the other one from Croatia is used.
HW/SW Setup:
- Polish machine: Dual CPU Intel(R) Xeon(TM) CPU 2.40GHz, 1GB RAM, ATA100 IDE HD D. SLC3 3.0.7, running solely toplevel BDII.
- Croatian machine:
AMD Opteron(TM) Processor 248 CPU 2.2 GHz 1024 KB L2, SCSI mirrored Hard Disk (Sun Fire X4100) Scientific Linux SL release 4.3 (Beryllium), solely toplevel DBII
- Experience
We experienced no major functionality problems with the machines (a few "connection timeouts" per week in total)
Having the other machine allows for carrying out short maintenance works transparently - no notification needed, sites are not even aware of maintenance works, the other machine handles all queries.
- Some technical details for single machine
- network peaks at the level of 500 kb/sec incoming traffic (bdii update) and
450 kb/sec outgoing traffic.
- occasionally load goes to the level of 4-5. Not investigated why yet.
Italy:
We have a 6/7 top level BDII in Italy, but only one is used by the Italian Sites(egee-bdii.cnaf.infn.it). This is a DNS alias for 2 machines: egee-bdii-05.cnaf.infn.it and egee-bdii-06.cnaf.infn.it. They are both SunFire V20z, with AMD Opteron(tm) Processor 252, RAM 4GB, HD 60GB raid 1, they are monitored with nagios and lemon. The memory usage is ~ 1.5GB, the CPU load is ~45% (for each machine). Al Italian WNs point to this bdii (LCG_GFAL_INFOSYS=egee-bdii.cnaf.infn.it:2170). The BDII list is autogenerated and available at http://grid-it.cnaf.infn.it/fileadmin/bdii/egee-all-sites.conf (ALL Production SITEs in GOCDB + Other Italian Sites), no FCR. We experience no functionality problem, in case of scheduled downtime at CNAF, the DNS alias is redirect to a TOP-BDII at INFN-PADOVA, with the same site list, but a less powerfull hardware. We have also other top-level bdii, only pointed by some RBs/WMS, with different scope (only italian sites, certification sites, VO oriented with FCR, etc).
SEE:
In SEE ROC we have set up round robin dns for the dns alias bdii.egee-see.org and most of the sites in the region use that one as a top BDII. this allias points to bdii.phy.bg.ac.yu bdii101.grid.ucy.ac.cy bdii.athena.hellasgrid.gr
SWE:
3 top level BDIIS:
bdii.pic.es
ii02.lip.pt
bdii-egee.bifi.unizar.es
LIP-Lisbon has one top-bdii with the following setup: Machine : Sun x2100 cpu model name : Dual Core AMD Opteron(tm) Processor 175 @ 2211.351 MHz Memory : 2 Gb
NE:
We have two top level BDIIs, one at PDC and one at SARA, that we both intend to support for general use.
Here are some details about these BDIIs:
PDC:
Our is running on a 2.8 GHz Intel P4 processor with 800 MHz front side bus,
2 GB RAM and an 80 GB 7200 rpm IDE hard disk. The motherboard is a Supermicro P4SCE. We haven't any failover provisions either. As for the future, I don't think the hardware specifications of SweGrid II is finalized yet.
SARA:
At SARA we have a top level BDII running (mu33.matrix.sara.nl). Currently this is a Xeon dual processor system without any failover provisions (in case of failure we will have to move the service to another node).
We are in the process of buying new hardware for different core services and the top level BDII will be one of the services that will be hosted on this new hardware. For the reliability of these services we will use both redundant hardware and software solutions (e.g. HA Linux) dependent also on the kind of service.
UKI:
The top-level BDII is a dual CPU 2.66GHz Xeon with 2GB memory (about 1GB used); periods of very high (user + system) CPU usage frequently seen. The immediate plan is to deploy a second box and use round-robin DNS, and of course monitor the situation.
Our observation over the last month or so is that timeouts are affecting the SAM CE tests (also complaints from users).
Sometimes the load is predominantly local (in particular RBs, VO Box, UIs). Sometimes the load is mostly from the Tier-2s. I've attached a file that is the output from a script that parses the bdii-fwd.log files for the last 30 days, and groups the connections by host and by site. In summary, the sites/hosts with the most connections by day for the last 30 days are:
Connects Most active site Connects Most active Host
23472 gridpp.rl.ac.uk 15855 dgc-grid-44.brunel.ac.uk
98441 tier2.hep.manchester.ac.uk 18165 dgc-grid-44.brunel.ac.uk
40805 gridpp.rl.ac.uk 22404 lcgui0360.gridpp.rl.ac.uk
19934 tier2.hep.manchester.ac.uk 13912 lcgrb02.gridpp.rl.ac.uk
15736 gridpp.rl.ac.uk 13666 lcgrb02.gridpp.rl.ac.uk
40002 gridpp.rl.ac.uk 13972 lcgrb02.gridpp.rl.ac.uk
48734 gridpp.rl.ac.uk 16480 lcgui0361.gridpp.rl.ac.uk
44535 gridpp.rl.ac.uk 17820 fal-pygrid-19.lancs.ac.uk
52279 tier2.hep.manchester.ac.uk 22289 fe01.esc.qmul.ac.uk
35565 gridpp.rl.ac.uk 23370 fe01.esc.qmul.ac.uk
140349 tier2.hep.manchester.ac.uk 37503 fal-pygrid-19.lancs.ac.uk
116356 tier2.hep.manchester.ac.uk 35214 fe01.esc.qmul.ac.uk
61043 gridpp.rl.ac.uk 50382 fe01.esc.qmul.ac.uk
53576 gridpp.rl.ac.uk 28610 lcgvo0339.gridpp.rl.ac.uk
18538 gridpp.rl.ac.uk 12250 lcgvo0339.gridpp.rl.ac.uk
21435 gridpp.rl.ac.uk 12543 lcgui0357.gridpp.rl.ac.uk
29208 gridpp.rl.ac.uk 9580 fe01.esc.qmul.ac.uk
25929 gridpp.rl.ac.uk 2985 dgc-grid-44.brunel.ac.uk
37031 gridpp.rl.ac.uk 8366 gfm01.pp.rhul.ac.uk
23084 tier2.hep.manchester.ac.uk 18649 gfm01.pp.rhul.ac.uk
26313 gridpp.rl.ac.uk 17404 gfm01.pp.rhul.ac.uk
40069 gridpp.rl.ac.uk 15918 lcgrb02.gridpp.rl.ac.uk
57020 gridpp.rl.ac.uk 13742 fe01.esc.qmul.ac.uk
61542 gridpp.rl.ac.uk 16961 lcgrb02.gridpp.rl.ac.uk
41385 gridpp.rl.ac.uk 17193 lcgrb02.gridpp.rl.ac.uk
63603 gridpp.rl.ac.uk 20712 svr031.gla.scotgrid.ac.uk
76584 gridpp.rl.ac.uk 24600 lcgrb02.gridpp.rl.ac.uk
60533 gridpp.rl.ac.uk 19646 lcgvo0339.gridpp.rl.ac.uk
55622 gridpp.rl.ac.uk 21346 fe01.esc.qmul.ac.uk
131099 gridpp.rl.ac.uk 56356 lcgvo0339.gridpp.rl.ac.uk
CERN:
Total number of BDIIs: 12 (cluster gridbdii).
8 top-level BDIIs behind alias lcg-bdii.cern.ch alias (subcluster lcg-bdii). All of them are using FCR.
2 site-level BDIIs behind alias prod-bdii.cern.ch alias (subcluster prod-bdii). FCR not used.
2 top-level BDIIs behind sam-bdii.cern.ch alias (subcluster sam-bdii).
All of them are using FCR.
Present status: http://lxb2007.cern.ch/lcgbdii/stats_lcg-bdii.html
|
Top-10 connections/host (today) |
Top-10 connections/domain (today) |
|
|
18/02 : 1579106 18.2 |
61902 : lcgvm.triumf.ca |
326068 : .ch |