Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

WLCG-OSG-EGEE Operations meeting

Europe/Zurich
28-R-15 (VRVS (Sunny room))

28-R-15

VRVS (Sunny room)

Nick Thackray
Description
VRVS "Sunny" room will be available 15:30 until 18:00 CET
actionlist
minutes
    • 14:00 17:40
      28-R-15

      28-R-15

      • 16:00
        Feedback on last meeting's minutes 5m
        Minutes
      • 16:05
        Grid-Operator-on-Duty handover 5m
      • From Italy (backup: SouthEast Europe) to Russia (backup: CERN)

      • Tickets:
        Created: 54
        Updated: ?
        Closed: 46
        2nd e-mail: 19
        Quarantine: 26
        Reopened: 0
        Unsolvable: 4

        Discussion points:

      • SFT site history pages show an incomplete entry (missing timestamp). Savannah bug filed: #17980.
  • 16:10
    SC4 weekly report and upcoming activities 10m
  • 16:20
    VO software installation 10m
    Speaker: Alessandra Forti
  • 16:30
    New version of CIC portal 5m
    Speaker: Gilles MATHIEU
  • 16:35
    OPS VO status update 5m
    Speaker: Piotr Nyczyk
    more information
  • 16:40
    Action plan for Information System instabilities 5m
    Action 1: Update the BDII to reduce load during an update, implement caching, update critical BDIIs and submit new codes as a patch
    Result: This should smooth out the problem.

    Action 2: Create a wilki page to inform large sites how to move the site-level BDII of the CE.
    Result: This should stop all the site from disappearing and reduce replication failures.

    Action 3: Update Yaim to have a site-level BDII.
    Result: Makes it easier for a site to set up a site-level BDII.

    Action 4: Modify the use of the information providers so that the batch system is queried from the site-level BDII.
    Result: Eliminates the problem of the CE disappearing.

    Speaker: Laurence Field
  • 16:45
    New version of FCR tool 10m
    Speaker: Judit Novak
  • 16:55
    Issues to discuss from reports 25m

    Reports were not received from:
  • WLCG T1 sites: BNL; NDGF; TRIUMF
  • EGEE ROCs: South East Europe; UKI
  • VOs: ATLAS; ALICE; BioMed
  • 1. Problem with ranking last week: some sites were reporting free slots but it was not the case. Lots of jobs were queued at these site (example:CERN, Nikhef, cyfronet-lcg,..). Also, at CERN, for example, there are 2 CEs pointing to LSF but they are not reporting the same information. (LHCb VO)
  • 2. Would it be possible for all Tier-1 sites to follow some convention for naming their SRM endpoints for tape and disk? (LHCb VO)
  • 3. If a glite-CE and an lcg-CE share the same batch system, should the VO software tags also be shared (since essentially the software is available on WNs executing jobs, no matter which CE was used to install it)? (South East ROC)
  • 4. A new Alice site does not provide an SE. Is it OK that this site does not contain an SE? Are there any concerns? (Asia Pacific ROC)
  • 5. Propose to have reference files on published (WIKI) path available on all sites (T1, T2) to allow quick functional tests by dteam/ops members. (DECH ROC)
  • 6. SFTs should not require a special queue so that they always run. Can mandate that there needs to be some solution so that SFTs always run, but NOT that this mechanism needs to be a queue. Acceptable mechanisms are:
    a) NONE -- for large sites, simply giving ops VO a high priority will be sufficient, worker nodes are turning over frequently enough that no action is needed aside from high priority;
    b) standing reservations for OPS users;
    c) OR a special queue if a site chooses to go this route.

    If we use a queue for everything, the number of queues will soon exceed the number of worker nodes for most small sites... (NIKHEF)
  • 7. A new site is being certified to join the PPS. In PPS there is no way to run the SFTs on a site which is not certified yet, and this is a problem when adding a new site. (SouthWest Europe ROC)
  • 8. Sites would like to know what is the difference between the SFTs in Production and in PreProduction. Will they all eventually be controlled by the SAM framework? (SouthWest Europe ROC)
  • 17:20
    Review of action items 15m
    actionlist
  • 17:35
    AOB 5m