10–14 Oct 2016
San Francisco Marriott Marquis
America/Los_Angeles timezone

A Multipurpose Computing Center with Distributed Resources

11 Oct 2016, 15:30
1h 15m
San Francisco Marriott Marquis

San Francisco Marriott Marquis

Poster Track 6: Infrastructures Posters A / Break

Speaker

Jiri Chudoba (Acad. of Sciences of the Czech Rep. (CZ))

Description

The Computing Center of the Institute of Physics (CC IoP) of the Czech Academy of Sciences serves a broad spectrum of users with various computing needs. It runs WLCG Tier-2 center for the ALICE and the ATLAS experiments; the same group of services is used by astroparticle physics projects the Pierre Auger Observatory (PAO) and the Cherenkov Telescope Array (CTA). OSG stack is installed for the NOvA experiment. Other groups of users use directly local batch system. Storage capacity is distributed to several locations. DPM servers used by the ATLAS and the PAO are all in the same server room, but several xrootd servers for the ALICE experiment are operated in the Nuclear Physics Institute in Rez, about 10 km away. The storage capacity for the ATLAS and the PAO is extended by resources of the CESNET - the Czech National Grid Initiative representative. Those resources are in Plzen and Jihlava, more than 100 km away from the CC IoP. Both distant sites use a hierarchical storage solution based on disks and tapes. They installed one common dCache instance, which is published in the CC IoP BDII. ATLAS users can use these resources using the standard ATLAS tools in the same way as the local storage without noticing this geographical distribution.

Computing clusters LUNA and EXMAG dedicated to users mostly from the Solid State Physics departments offer resources for parallel computing. They are part of the Czech NGI infrastructure MetaCentrum with distributed batch system based on torque with a custom scheduler. Clusters are installed remotely by the MetaCentrum team and a local contact helps only when needed. Users from IoP have exclusive access only to a part of these two clusters and take advantage of higher priorities on the rest (1500 cores in total), which can also be used by any user of the MetaCentrum. IoP researchers can also use distant resources located in several towns of the Czech Republic with a capacity of more than 12000 cores in total.

This contribution will describe installation and maintenance procedures, transition from cfengine to puppet, monitoring infrastructure based on tools like nagios, munin, ganglia and organization of the user support via Request Tracker. We will share our experience with log file processing using ELK stack. The network infrastructure description and its load will also be given.

Primary Keyword (Mandatory) Computing facilities
Secondary Keyword (Optional) Monitoring
Tertiary Keyword (Optional) Distributed data handling

Author

Jiri Chudoba (Acad. of Sciences of the Czech Rep. (CZ))

Co-authors

Alexandr Mikula (Acad. of Sciences of the Czech Rep. (CZ)) Dagmar Adamova (Acad. of Sciences of the Czech Rep. (CZ)) Jan Svec (Acad. of Sciences of the Czech Rep. (CZ)) Martin Adam (Acad. of Sciences of the Czech Rep. (CZ)) Petr Vokac (Czech Technical University (CZ)) Tomas Kouba (Acad. of Sciences of the Czech Rep. (CZ)) Václav Říkal Mrs jana Uhlirova (Institute of Physics of the CAS)

Presentation materials