Mr Igor Sfiligoi (University of California San Diego)
OSG has been operating for a few years at UCSD a glideinWMS factory for several scientific communities, including CMS analysis, HCC and GLOW. This setup worked fine, but it had become a single point of failure. OSG thus recently added another instance at Indiana University, serving the same user communities. Similarly, CMS has been operating a glidein factory dedicated to reprocessing activities at Fermilab, with similar results. Recently, CMS decided to host another glidein factory at CERN, to increase the availability of the system, both for analysis, MC and reprocessing jobs. Given the large overlap between this new factory and the three factories in the US, and given that CMS represents a significant fraction of glideins going through the OSG factories, CMS and OSG formed a common operations team that operates all of the above factories. The reasoning behind this arrangement is that most operational issues stem from Grid-related problems, and are very similar for all the factory instances. Solving a problem in one instance thus very often solves the problem for all of them. This talk presents the operational experience of how we address both the social and technical issues of running multiple instances of a glideinWMS factory with operations staff spanning multiple time zones on two continents.
Frank Wuerthwein (Univ. of California San Diego (US)) Ignas Butenas (Vilnius University (LT)) Mr Igor Sfiligoi (University of California San Diego) Mr Jeffrey Michael Dost (University of California San Diego) Dr Jose Hernandez Calama (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES) José Flix Marian Zvada (KIT - Karlsruhe Institute of Technology (DE)) Peter Kreuzer (Rheinisch-Westfaelische Tech. Hoch. (DE)) Rob Quick (OSG - Indiana University) Scott Werner Teige (Indiana University (US))