HEPiX Spring 2015 Workshop

Name: HEPiX Spring 2015 Workshop
Start: 2015-03-23T08:00:00+00:00
End: 2015-03-27T13:20:00+00:00
Location: Physics Department, Oxford University

23–27 Mar 2015

Physics Department, Oxford University

Europe/London timezone

Support

hepix-spring2015@physics.ox.ac.uk

Contribution List

78. Welcome Address

Prof. John Wheater (Oxford University)

23/03/2015, 09:00

Miscellaneous

Miscellaneous

Welcome Address
Go to contribution page
79. Workshop Logistics

Peter Gronbech (University of Oxford (GB))

23/03/2015, 09:10

Miscellaneous

Miscellaneous

Workshop Logistics
Go to contribution page
13. CSC - It Center for Science

Johan Henrik Guldmyr (Helsinki Institute of Physics (FI))

23/03/2015, 09:15

Site reports

Site Reports

- CSC General HPC updates - New Haswell Hardware for supercluster and supercomputer - Slurm/Lustre - taito-shell.csc.fi - a slurm/sshd/iptables-based interactive shell load balancer. Replaced large memory (1TB) interactive nodes. - DDN SFA12k - Using ELK stack (Elasticsearch Logstash Kibana) - sort through dCache logs - Search through auditd logs - anybody used shred? root...
Go to contribution page
21. Nikhef site-report

Paul Kuipers (Nikhef)

23/03/2015, 09:30

Site reports

Site Reports

Spring 2015 site report
Go to contribution page
49. IHEP Site Report

Jingyan Shi (IHEP)

23/03/2015, 09:45

Site reports

Site Reports

The status of IHEP site and the improvement we've got and what is in our plan this year.
Go to contribution page
28. Site Report - Prague

Martin Adam (UJF Rez), Václav Říkal

23/03/2015, 10:00

Site reports

Site Reports

We will give an overview of the site and will share experience with these topics: migration of virtualized servers to a new infrastructure, migration from cfengine to puppet and spacewalk as new systems management solution, procurement of a new hardware (worker nodes and storage servers).
Go to contribution page
18. INFN-T1 Site report

Giuseppe Misurelli (Unknown)

23/03/2015, 10:15

Site reports

Site Reports

Update on INFN-T1
Go to contribution page
56. Site Report GSI

Walter Schon

23/03/2015, 10:55

Site reports

Site Reports

Site Report GSI
Go to contribution page
10. BNL RACF Site Report

William Strecker-Kellogg (Brookhaven National Lab)

23/03/2015, 11:10

Site reports

Site Reports

Brookhaven National Lab (BNL) will present the site report for the RHIC-ATLAS Computing Facility (RACF), covering developments over the past 6 months.
Go to contribution page
59. PDSF Site Report and Relocation

James Botts (LBNL)

23/03/2015, 11:25

Site reports

Site Reports

PDSF, the Parallel Distributed Systems Facility, has been serving high energy physics and been in continuous operation at NERSC since 1996. It is currently a tier-1 site for Star, tier-2 for Alice and tier-3 for Atlas. This site report will describe recent updates to the system and upcoming modifications. PDSF will move this year from its current site to a new building on the LBNL campus and...
Go to contribution page
50. T2_US_Nebraska Site Report

Garhan Attebury (University of Nebraska (US))

23/03/2015, 11:40

Site reports

Site Reports

Site report covering the status of T2_US_Nebraska and changes / updates since the Fall 2014 meeting.
Go to contribution page
2. Jefferson Lab Scientific and High Performance Computing

Sandy Philpott (JLAB)

23/03/2015, 11:55

Site reports

Site Reports

Current high performance and experimental physics computing environment updates: core exchanges between USQCD and Experimental Physics clusters for load balancing, job efficiency, and 12GeV data challenges; Nvidia K80 GPU experiences and updated Intel MIC environment; update on locally developed workflow tools and write-through to tape cache filesystem; status of LTO6 integration into our MSS;...
Go to contribution page
75. DESY site report

Mr Peter van der Reest (DESY)

23/03/2015, 12:10

Site reports

Site Reports

DESY site report
Go to contribution page
85. CCIN2P3 Site Report

Mr Julien Carpentier (CCIN2P3)

23/03/2015, 12:25

Site reports

Site Reports

We will present the lastest status of the IN2P3 Computer Center. Emphasis will be made to the infrastructure and system area.
Go to contribution page
1. Accelerating Scientific Analysis with SciDB

Lisa Gerhardt (LBNL), Mr Yushu Yao (LBNL)

23/03/2015, 14:00

End-User IT Services & Operating Systems

End-user Services and Operating Systems

SciDB is an open-source analytical database for scalable complex analytics on very large array or multi-structured data from a variety of sources, programmable from Python and R. It runs on HPC, commodity hardware grids, or in a cloud and can manage and analyze terabytes of array-structured data and do complex analytics in-database. We present an overall description of the SciDB framework and...
Go to contribution page
51. HEP Software Foundation

Mr Michel Jouvin (Laboratoire de l'Accelerateur Lineaire (FR))

23/03/2015, 14:25

End-User IT Services & Operating Systems

End-user Services and Operating Systems

The HEP Software Foundation (HSF) is a one year old inititative to foster collabarotion in software development in the HEP community and related scientific communities. Launched by a kick-off meeting at CERN in April 2014, the first year has been spend to better define what HSF should be. An HSF workshop was held in January at SLAC and HSF is now entering is "implementation phase". This talk...
Go to contribution page
58. CERN Search and Social for the Enterprise Web experience

Mr Andreas Wagner (CERN)

23/03/2015, 14:50

End-User IT Services & Operating Systems

End-user Services and Operating Systems
- Status of CERN Web Services
- CERN’s Enterprise Social Networking System
- CERN Search...
Go to contribution page
15. Evolutions in the CERN Conferencing Services Landscape

Thomas Baron (CERN)

23/03/2015, 15:15

End-User IT Services & Operating Systems

End-user Services and Operating Systems

A lot of visible and behind-the-scene actions have been taken in recent months to prepare CERN conferencing services (Indico, Vidyo, the webcast and conference rooms services) for challenges to come. These services will be described in terms of features and usage statistics. We will present their integration to the CERN layered cloud infrastructure, and with other IT base services. We will...
Go to contribution page
83. Scientific Linux Current Status

Connie Sieh (FNAL)

23/03/2015, 16:05

End-User IT Services & Operating Systems

End-user Services and Operating Systems

Current Status of Scientific Linux
Go to contribution page
77. CERN CentOS 7 Update

Dr Arne Wiebalck (CERN)

23/03/2015, 16:30

End-User IT Services & Operating Systems

End-user Services and Operating Systems

In this talk we will present a brief status update on CERN's work on CentOS 7, the uptake by the various IT services, and the interaction with the upstream CentOS community.
Go to contribution page
70. Getting the most from the farm at the Sanger Institute

Mr Emyr James (Wellcome Trust, Sanger Institute)

23/03/2015, 16:55

End-User IT Services & Operating Systems

End-user Services and Operating Systems

The Wellcome Trust Sanger Institute is a charitably funded genomic research centre. A leader in the Human Genome Project, it is now focused on understanding the role of genetics in health and disease. Large amounts of data is produced at the institute by next-generation sequencing machines. The data is then stored, processed and analysed on the institute's computing cluster. The main compute...
Go to contribution page
3. Overview of operational issues at CERN in the recent past

Wayne Salter (CERN)

23/03/2015, 17:20

IT Facilities & Business Continuity

IT Facilities and Business Continuity

Many of you are aware of the power incident we had on the 16th October during the last HEPiX workshop. I will give a detailed explanation of what happened, the impact on IT services as well as the actions taken to recover from the incident. I will also note some improvements that will be implemented as a result of this incident. I will then go on to discuss other operations incidents that we...
Go to contribution page
29. PIC Tier-1 Spring 2015 report

Jose Flix Molina (Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)

24/03/2015, 09:00

Site reports

Site Reports

We will be revising the status of PIC Tier-1 by Spring 2015. The typical site report which is reported in HEPIX.
Go to contribution page
40. Oxford University Site Report

Dr Sean Brisbane (University of Oxford)

24/03/2015, 09:15

Site reports

Site Reports

A site report from the University of Oxford focusing on the integration challenges between the various systems.
Go to contribution page
0. CERN Site Report

Dr Arne Wiebalck (CERN)

24/03/2015, 09:30

Site reports

Site Reports

News from CERN since the Lincoln meeting.
Go to contribution page
63. DLS site report

Tina Friedrich (Diamond Light Source Ltd)

24/03/2015, 09:45

Site reports

Site Reports

Diamond Light Source site report
Go to contribution page
64. KISTI-GSDC Site Report

Sang Un Ahn (KiSTi Korea Institute of Science & Technology Information (KR))

24/03/2015, 10:00

Site reports

Site Reports

The status of KISTI-GSDC Tier-1 site will be present including a brief of history of the KISTI-GSDC Site, system summary (configuration management), PBS batch issues, Tier-1 operations and future plan.
Go to contribution page
66. RAL Site Report

Martin Bly (STFC-RAL)

24/03/2015, 10:15

Site reports

Site Reports

Latest updates for the RAL Tier-1.
Go to contribution page
4. Update on software collaboration services at CERN

Nils Hoimyr (CERN)

24/03/2015, 10:55

End-User IT Services & Operating Systems

End-user Services and Operating Systems

An update will be given on the status of collaborative tools for software developers, Version Control Services (Git and SVN), Issue Tracking (JIRA), Integration (Jenkins) and documentation (TWiki) The presentation will focus on collaborative ascpects for software developers and report on progress since the fall meeting.
Go to contribution page
73. E-Mail-Migration: transition from Exchange and UNIX mail to Zimbra

Mr Dirk Jahnke-Zumbusch (DESY)

24/03/2015, 11:15

End-User IT Services & Operating Systems

End-user Services and Operating Systems

After more than ten years of operations the game is over for Exchange 2003 at DESY. Now Zimbra has been set into production and data from both Exchange 2003 and the UNIX mail service is being migrated and consolidated gradually. The architecture of the Zimbra mail service, the migration procedures and some experiences will be presented. Finally we will look at some integration aspects of...
Go to contribution page
7. Status of volunteer computing at CERN

Nils Hoimyr (CERN)

24/03/2015, 11:40

End-User IT Services & Operating Systems

End-user Services and Operating Systems

Status of LHC@home, volunteer computing at CERN and for the LHC experiments. The presenter will give an update on the volunteer computing strategy for HEP and different scenarii for use of volunteer cloud computing or other lightweight cloud infrastructes to run experiment code under CernVM on available computing resources. Furthermore, the current status of the CERN BOINC server...
Go to contribution page
84. FNAL site report

Rennie S. Scott (FNAL), connie sieh (Fermilab)

24/03/2015, 12:05

Site reports

Site Reports

Site report from Fermilab
Go to contribution page
11. Effects of packet loss and delay on TCP performance

Adam Lukasz Krajewski (Warsaw University of Technology (PL))

24/03/2015, 12:20

Security & Networking

Security and Networking

Following an incident with a slow database replication between CERN's data centers, we discovered that even a very low rate packet loss in the network (order of 0.001%) can induce significant penalties to long distance single stream TCP transfers. We explore the behaviour of multiple TCP congestion control algorithms in a controlled loss and delay environment in order to understand...
Go to contribution page
5. Computer Security update

Mr Romain Wartel (CERN)

24/03/2015, 14:00

Security & Networking

Security and Networking

This presentation gives an overview of the current computer security landscape. It describes the main vectors of compromises in the academic community including lessons learnt, and reveal inner mechanisms of the underground economy to expose how our resources are exploited by organised crime groups, as well as recommendations to protect ourselves. By showing how these attacks are both...
Go to contribution page
30. WLCG Cloud Traceability Working Group

Ian Peter Collier (STFC - Rutherford Appleton Lab. (GB))

24/03/2015, 14:25

Security & Networking

Security and Networking

Report on the initial activities of the WLCG Cloud Traceability Working Group
Go to contribution page
35. Recent experiences in operational security: incident prevention and incident handling in the EGI and WLCG infrastructure

Linda Ann Cornwall (STFC - Rutherford Appleton Lab. (GB))

24/03/2015, 14:50

Security & Networking

Security and Networking

The European Grid Infrastructure (EGI) and Worldwide Large Hadron collider Grid (WLCG) infrastructure largely overlap and share the majority of security activities. A lot of security related activity goes on behind the scenes concerning such a large scale distributed computing infrastructure. Security incident prevention takes up the larger amount of effort, and this is carried out via...
Go to contribution page
39. A recent view of OSSEC and Elasticsearch at Scotgrid Glasgow

David Crooks (University of Glasgow (GB))

24/03/2015, 15:15

Security & Networking

Security and Networking

OSSEC, the popular HIDS (Host Intrusion Detection System), has been widely used for a number of years. More recently, tools like Elasticsearch, Logstash and Kibana (ELK) have become popular in visualising and working with data such as that aggregated by OSSEC. We report on a recent implementation of OSSEC, coupled to an ELK instance, at the Glasgow site of the UKI-SCOTGRID distributed Tier-2....
Go to contribution page
43. Update on WLCG/OSG perfSONAR Infrastructure

Marian Babik (CERN)

24/03/2015, 15:50

Security & Networking

Security and Networking

WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. The WLCG Network and Transfer Metrics working group was established to ensure sites and experiments can better understand and fix networking issues....
Go to contribution page
34. News from the HEPiX IPv6 Working Group

Dave Kelsey (STFC - Rutherford Appleton Lab. (GB))

24/03/2015, 16:15

Security & Networking

Security and Networking

This talk will present an update from the HEPiX IPv6 Working Group. This will include details of recent testing activities and plans for the deployment of dual-stack data services and monitoring on (at least some of) the WLCG infrastructure.
Go to contribution page
37. Testing dCache and IPv6

Ulf Bobson Severin Tigerstedt (Helsinki Institute of Physics (FI))

24/03/2015, 16:40

Security & Networking

Security and Networking

A view back on testing IPv6 and different versions of dCache as it has evolved from 2.6 to 2.12 and barely-working to well working.
Go to contribution page
48. The IPV6 post office: labeling and sorting everywhere.

Francesco Prelz (Università degli Studi e INFN Milano (IT))

24/03/2015, 16:55

Security & Networking

Security and Networking

Probably the most prominent change that IPv6 introduces in the semantics of internet protocol applications is the need to *always* deal with multiple addresses (possibly both IPv4 and IPv6) associated to each network endpoint. A quick overview of how and where addresses are categorised, ordered and preferred is presented, both from the system administrator and the developer viewpoint. A few...
Go to contribution page
71. Evaluation of distributed open source solutions in CERN database use cases

Kacper Surdy (CERN)

24/03/2015, 17:20

Storage & Filesystems

Storage and File Systems

There are terabytes of data stored in a relational database (Oracle) at CERN which in fact does not need a relational model. Moreover, using a relational database management system very often brings a significant overhead in terms of resource utilization. The problem is notably observable for warehouse-type data sets. At the same time running analytical workloads on such data sets requires...
Go to contribution page
82. IPv6 tutorial for administrators

Dave Kelsey (STFC - Rutherford Appleton Lab. (GB)), Dr Shawn McKee (University of Michigan ATLAS Group)

24/03/2015, 18:00

Security and Networking
44. Dust sensors for long term data preservation

Julien Leduc (CERN)

25/03/2015, 09:00

IT Facilities & Business Continuity

IT Facilities and Business Continuity

CERN Computer Center (CC) is a large building that integrates several kilometers of fibers, copper cables, pipes and several complex installations (UPSes, water cooling, heat exchangers...). This evolving building is a large theater with numerous actors: - contractors, performing construction work, building maintenance or hardware replacement - engineers and technicians, debugging...
Go to contribution page
61. Building large storage systems with small units: How to make use of disks with integrated network and CPU

Yves Kemp (Deutsches Elektronen-Synchrotron (DE))

25/03/2015, 09:25

Storage & Filesystems

Storage and File Systems

Recent advances in both, hard-disks and system-on-a-chip (SoC) designs enabled the development of a novel form of hard-disk: a disk that includes a network interface and an additional ARM processor, not involved in low level disk operations. This setup allows those disks to run an operating system and to communicate with other nodes autonomously using wired Ethernet. No additional hardware or...
Go to contribution page
53. ASAP3: New data taking and analysis infrastructure for PETRA III

Stefan Dietrich (DESY)

25/03/2015, 09:50

Storage & Filesystems

Storage and File Systems

PETRA III is DESY's largest ring accelerator and the most brilliant storage-ring-based X-ray radiation source in the world. With its recent extension, new and faster detectors are used for the data acquisition. They exceed previous detectors in terms of data rate and volume; this is highly demanding for the underlying storage system. This talk will present the challenges we faced, the new...
Go to contribution page
62. BeeGFS at DESY

Yves Kemp (Deutsches Elektronen-Synchrotron (DE))

25/03/2015, 10:15

Storage & Filesystems

Storage and File Systems

The presentation will present: - History and current status of the BeeGFS project (formerly know as FhGFS, originating from Fraunhofer) - Design and technology decisions made by BeeGFS developers - BeeGFS setup and operational experience as IniniBand based high-performance cluster file system serving as scratch space for the DESY HPC system - Discussion of future usage scenarios and...
Go to contribution page
67. Ceph storage at RAL

Alastair Dewhurst (STFC - Rutherford Appleton Lab. (GB))

25/03/2015, 11:05

Storage & Filesystems

Storage and File Systems

RAL is currently exploring the possibilities offered by Ceph. This talk will describe two of these projects. The first project aims to provide large scale, high throughput storage for experimental data. This will initially be used by the WLCG VOs. A prototype cluster built from old hardware has been in testing since October 2014. The WLCG VOs will continue to need to access their data via...
Go to contribution page
14. Ceph operations at CERN

Herve Rousseau (CERN)

25/03/2015, 11:30

Storage & Filesystems

Storage and File Systems

Ceph has become over time a key component of CERN’s Agile Infrastructure by providing storage for the Openstack service. In this talk, we will briefly introduce Ceph’s concepts, our current cluster and the services we provide such as NFS filers, Object Store for the Atlas experiment and Xroot-to-Ceph gateways. We will then talk about our experience running Ceph with some real-world...
Go to contribution page
23. Status Report on Ceph Based Storage Systems at the RACF

Dr Ofer Rind (BROOKHAVEN NATIONAL LABORATORY)

25/03/2015, 11:55

Storage & Filesystems

Storage and File Systems

We review various functionality, performance, and stability tests performed at the RHIC and ATLAS Computing Facility (RACF) at Brookhaven National Laboratory (BNL) in 2014-2015. Tests were run on all three (object storage, block storage and file system) levels of Ceph, using a range of hardware platforms and networking solutions, including 10/40 Gbps Ethernet and IPoIB/4X FDR Infiniband. We...
Go to contribution page
41. Ceph development update

Mr Spray John (Red Hat, Inc.)

25/03/2015, 12:20

Storage & Filesystems

Storage and File Systems

The Ceph storage system is an open source, highly scalable, resilient data storage service providing object, block and file interfaces. This presentation will introduce what is new in the latest Ceph release, codenamed *Hammer*, and describe the ongoing development activities around CephFS, the Ceph filesystem. An intermediate level of familiarity with large scale storage systems will be assumed.
Go to contribution page
81. Panel and BoF session: Ask the CEPH experts

Dr Arne Wiebalck (CERN)

25/03/2015, 13:00

Storage and File Systems
25. Operation and benchmarking of a commercial datacentre

Peter Love (Lancaster University (GB))

25/03/2015, 14:00

Computing & Batch Services

Computing and Batch Systems

This contribution describes the usage and benchmarking of a commercial data centre running Openstack. Different cloud provisional tools are described highlighting the pros and cons of each system. A comparison is made between this facility and a standard grid T2 site in terms of job throughput and availability. Usage of the centre’s local object store is also described.
Go to contribution page
6. Remote evaluation of hardware

Dr Tony Wong (Brookhaven National Laboratory)

25/03/2015, 14:25

Computing & Batch Services

Computing and Batch Systems

The RHIC-ATLAS Computing Facilty (RACF) at BNL has traditionally evaluated hardware on-site, with physical access to the systems. The effort to request evaluation hardware, shipping, set-up and testing has consumed an increasing amount of time and the process has become less productive over the years. To regain past productivity and shorten the evaluation process, BNL has started a pilot...
Go to contribution page
38. Evaluation of Memory and CPU usage via Cgroups of ATLAS workloads running at a Tier-2

Gang Qin (University of Glasgow (GB))

25/03/2015, 14:50

Computing & Batch Services

Computing and Batch Systems

Modern Linux Kernels include a feature set that enables the control and monitoring of system resources, called Cgroups. Cgroups have been enabled on a production HTCondor pool sited at the Glasgow site of the UKI-SCOTGRID distributed Tier-2. A system has been put in place to collect and aggregate metrics extracted from Cgroups on all worker nodes within the Condor pool. From this...
Go to contribution page
46. Beyond HS06: Toward a New HEP CPU Benchmark - Update March 2015

Manfred Alef (Karlsruhe Institute of Technology (KIT))

25/03/2015, 15:15

Computing & Batch Services

Computing and Batch Systems

In this talk we will provide information about the current status of the preliminary work to relaunch the HEPiX Benchmarking Working Group which will develop the next release of the HEP CPU benchmark.
Go to contribution page
36. Looking for a fast benchmark

Dr Michele Michelotto (INFN Padua & CMS)

25/03/2015, 16:05

Computing & Batch Services

Computing and Batch Systems

The WLCG community has requested a fast benchmark to quickly assess the perfomances of a worker node. A good candidate is a python script used in LHCb
Go to contribution page
57. Evaluation of low power Systems on Chip for scientific computing

Dr Lucia Morganti (INFN)

25/03/2015, 16:30

Computing & Batch Services

Computing and Batch Systems

Systems on Chip (SoCs), originally targeted for mobile and embedded technology, are becoming attractive for HEP and HPC scientific communities, given their low cost, huge worldwide shipments, low power consumption and increasing processing power - mostly associated with their GPUs. A variety of development boards are currently available, making it foreseeable to use these power-efficient...
Go to contribution page
65. A look beyond x86: OpenPOWER8 & AArch64

Liviu Valsan (CERN)

25/03/2015, 16:55

Computing & Batch Services

Computing and Batch Systems

x86 is the uncontested leader for server platforms in terms of market share and is currently the architecture of choice for High Energy Physics applications. But as more and more importance is given to power efficiency, physical density and total cost of ownership we are seeing new processor architectures emerging and some existing ones becoming more open. With the introduction of AArch64,...
Go to contribution page
76. HEPSPEC analysis across modern architectures

Mr David Power (Boston Ltd.)

25/03/2015, 17:20

Computing & Batch Services

Computing and Batch Systems

The talk's coverage will include Xeon Haswell, ARM and Open Compute Platforms
Go to contribution page
26. Status of Centralized Config Management at the RACF

William Strecker-Kellogg (Brookhaven National Lab)

26/03/2015, 09:00

Basic IT Services

Basic IT Services

It's simple enough to instantiate a new process in an existing environment; it can be much more challenging to foster acceptance of such a process in IT environments and cultures that are traditionally stagnant and resistant to change, and to maintain and optimize that process to ensure it continues to realize optimal benefit. To enhance our computing facility, we've already taken...
Go to contribution page
12. Configuration management at CERN: Status and directions

Alberto Rodriguez Peon (Universidad de Oviedo (ES))

26/03/2015, 09:25

Basic IT Services

Basic IT Services

CERN’s experience of migrating a large site to a Puppet-based and more dynamic Configuration Service will be presented. The presentation will review some of the challenges encountered along the way and describe future plans for how to scale the service and improve the overall automation of operations on the site.
Go to contribution page
69. Deployment and usage of MCollective in production

Stefan Dietrich (DESY)

26/03/2015, 09:50

Basic IT Services

Basic IT Services

Marionette Collective, also known as MCollective, is a framework for building server orchestration, monitoring, and parallel job execution. MCollective uses a modern "Publish Subscribe Middleware" for a scalable and fast execution environment. It is a powerful tool in combination with Puppet, due to the good integration. However it can be a challenging task to configure and deploy...
Go to contribution page
47. Quattor in 2015

James Adams (STFC RAL)

26/03/2015, 10:15

Basic IT Services

Basic IT Services

The Quattor community has been maintaining Quattor for over ten years and having recently held our 19th community workshop the pace of development continues to increase. This talk will demonstrate why Quattor is more than just a configuration management system, report on recent developments and provide some notable updates and experiences from sites.
Go to contribution page
24. Subtlenoise: sonification of distributed computing activity

Peter Love (Lancaster University (GB))

26/03/2015, 11:05

Basic IT Services

Basic IT Services

The dominant monitoring system used in distributed computing consists of visually rich time-series graphs and notification systems for alerting operators when metrics fall outside of accepted values. For large systems this can quickly become overwhelming. In this contribution a different approach is described using the sonification of monitoring messages with an architecture which fits easily...
Go to contribution page
9. Towards a modernisation of CERN’s telephony infrastructure

Francisco Valentin Vinagrero (CERN)

26/03/2015, 11:30

Basic IT Services

Basic IT Services

IP-based voice telephony (VoIP) and the SIP protocol are clear examples of disruptive technologies that have revolutionised a previously settled market. In particular, open-source solutions now have the ascendancy in the traditional Private Branch eXchange(PBX) market. We present a possible architecture for the modernisation of CERN's fixed telephony network, highlighting the technical...
Go to contribution page
52. Updates from Database Services at CERN

Andrei Dumitru (CERN)

26/03/2015, 11:55

Basic IT Services

Basic IT Services

CERN has a great number of applications that rely on a database for their daily operations. From physics related databases to the administrative, sector there is a high demand to have a database system appropriate to the users' needs and requirements. This presentation gives a summary of the current state of the Database Services at CERN, the work done during LS1 and some insights into the...
Go to contribution page
8. DRMAA2 - An Open Standard for Job Submission and Cluster Monitoring

Daniel Gruber (U)

26/03/2015, 12:20

Computing & Batch Services

Computing and Batch Systems

- Introduction - DRMAA2 in a Nutshell - The C Interface - Data Types, Monitoring Sessions, Job Sessions, Working with Jobs, Job Templates, Error Handling and Dealing with Enhancements - Getting started with DRMAA2 - Example Applications - Job Monitoring Applications and Simple Multi-Clustering
Go to contribution page
60. Cloud @ RAL, an update

George Ryall (STFC - Rutherford Appleton Lab.)

26/03/2015, 14:00

Grid, Cloud & Virtualisation

Grids, Clouds and Virtualisation

The STFC Scientific computing department has been developing an OpenNebula based cloud underpinned by Ceph block storage. I will describe some of our use cases, our set up,and give a demonstration of our development VM on demand service. I will go on to explore some of the problems we have overcome to reach this point. Finally, I will present the work we are doing to use spare capacity on...
Go to contribution page
16. CERN Cloud Report

Bruno Bompastor (CERN)

26/03/2015, 14:25

Grid, Cloud & Virtualisation

Grids, Clouds and Virtualisation

This is a report on the current status and future plans of CERN’s OpenStack-based Cloud Infrastructure.
Go to contribution page
68. Ceph vs Local Disk For Virtual Machines

Alexander Dibbo (urn:Google)

26/03/2015, 14:50

Grid, Cloud & Virtualisation

Grids, Clouds and Virtualisation

The Scientific Computing Department at the STFC has been developing a Ceph block storage backed OpenNebula cloud. We have carried out a quantitative evaluation of the performance characteristics of virtual machines which have been instantiated with a variety of different storage configurations (using both Ceph and local disks). I will describe our motivations for this testing, our methodology...
Go to contribution page
17. The Vacuum model for running jobs in VMs

Andrew McNab (University of Manchester (GB))

26/03/2015, 15:15

Grid, Cloud & Virtualisation

Grids, Clouds and Virtualisation

The Vacuum model provides a method for managing the lifecycle of virtual machines based on their observed success or failure in finding work to do for their experiment. In contrast to centrally managed grid job submission and cloud VM instantiation systems, the Vacuum model gives resource providers direct control over which experiments' VMs or jobs are created and in what proportion. This...
Go to contribution page
20. Running ATLAS at scale on Amazon EC2

John Hover (Brookhaven National Laboratory (BNL)-Unknown-Unknown)

26/03/2015, 16:05

Grid, Cloud & Virtualisation

Grids, Clouds and Virtualisation

Beginning in September 2014, the RACF at Brookhaven National Lab has been collaborating with Amazon's scientific computing group in a pilot project. The goal of this project is to demonstrate the usage of Amazon AWS (EC2, S3, etc.) for real-world ATLAS production. This will prove the practical and economic feasibility of ATLAS beginning to leverage commercial cloud computing to optimize...
Go to contribution page
54. Best Practices for Big Science in the Cloud

Mr Dario Rivera (Amazon Web Services)

26/03/2015, 16:30

Grid, Cloud & Virtualisation

Grids, Clouds and Virtualisation

On the heals of discussing the BNL RACF Group's Proof Of Concept on AWS, this session will share best practices on some of the most common AWS services used by Big Science, such as EC2, VPC, S3, and complex hybrid networking and routing. We will also provide an overview of the AWS Scientific Computing Group which was created to help Global Scientific collaborations develop and ecosystem...
Go to contribution page
22. OpenStack Heat @ CERN

Bruno Bompastor (CERN)

26/03/2015, 16:55

Grid, Cloud & Virtualisation

Grids, Clouds and Virtualisation

Heat, the Openstack orchestration service, is being deployed at CERN. We will be presenting the overall architecture and features included in the project, our deployment challenges and future plans.
Go to contribution page
33. STAR Experience with Automated High Efficiency Grid Based Data Production Framework at KISTI/Korea

Mr Levente Hajdu (Brookhaven National Laboratory)

26/03/2015, 17:20

Grid, Cloud & Virtualisation

Grids, Clouds and Virtualisation

In statistically hungry science domains, data taking data deluges can be both a blessing and a curse. They allow the winnowing out of statistical errors from known measurements, open the door to new scientific opportunities as the physics program matures but are also a testament to the efficiency of the experiment and accelerator and skill of its operators. However, the data samples need to be...
Go to contribution page
19. Batch Processing at CERN

Jerome Belleman (CERN)

27/03/2015, 09:00

Computing & Batch Services

Computing and Batch Systems

The CERN Batch System comprises 4000 worker nodes, 60 queues and offers a service for various types of large user communities. In light of the developments driven by the Agile Infrastructure and the more demanding processing requirements, it is faced with increasingly challenging scalability and flexibility needs. This production cluster currently runs IBM/Platform LSF. Over the last...
Go to contribution page
45. Grid Engine at GridKa

Manfred Alef (Karlsruhe Institute of Technology (KIT))

27/03/2015, 09:25

Computing & Batch Services

Computing and Batch Systems

The Grid Computing Centre Karlsruhe (GridKa) is using the Grid Engine batch system since 2011. In this presentation I will talk about the experiences with this batch system, including multi-core job support, and first experiences with cgroups.
Go to contribution page
72. SLURM update from the Nordics

Erik Mattias Wadenstein (University of Umeå (SE))

27/03/2015, 09:50

Computing & Batch Services

Computing and Batch Systems

An update on the current status of SLURM usage in the Nordics, as well as recent developments in improving support for LHC type jobs including tuning for efficient scheduling of multicore grid jobs. Also an overview of some remaining challenges will be given together with discussion on how to address them.
Go to contribution page
42. Condor Workshop Summary

Mr Michel Jouvin (Laboratoire de l'Accelerateur Lineaire (FR))

27/03/2015, 10:15

Computing & Batch Services

Computing and Batch Systems

I propose to give a summary of the Condor workshop, held at CERN mid-December.
Go to contribution page
31. Two years of HTCondor at the RAL Tier-1

Andrew David Lahiff (STFC - Rutherford Appleton Lab. (GB))

27/03/2015, 11:05

Computing & Batch Services

Computing and Batch Systems

After running Torque/Maui for many years, the RAL Tier-1 migrated to HTCondor during 2013 in order to benefit from improved reliability, scalability and additional functionality unavailable in Torque. This talk will discuss the deployment of HTCondor at RAL, our experiences and the evolution of our pool over the past two years, as well as our future plans.
Go to contribution page
27. Future of Batch Processing at CERN

Jerome Belleman (CERN)

27/03/2015, 11:30

Computing & Batch Services

Computing and Batch Systems

While we are taking measures to face the limitations discussed earlier on in our IBM/Platform LSF cluster, we have been working on setting up a new batch system based on HTCondor. There has been some progress with the pilot service which we described last HEPiX. We also went on investigating some of the more advanced functions which will lead up to the production state of the new CERN...
Go to contribution page
32. HTCondor within the European Grid and in the Cloud

Andrew David Lahiff (STFC - Rutherford Appleton Lab. (GB))

27/03/2015, 11:55

Computing & Batch Services

Computing and Batch Systems

With the increasing interest in HTCondor in Europe, an important question for sites considering migrating to HTCondor is how well it integrates with the standard grid middleware, in particular integration with the information system and APEL accounting. Also, with the increasing interest and usage of private clouds, how easily a batch system can be integrated with a private cloud is another...
Go to contribution page
55. DrainBoss: A Drain Rate Controller for ARC/HTCondor

Stephen Jones (Liverpool University)

27/03/2015, 12:20

Computing & Batch Services

Computing and Batch Systems

This talk describes DrainBoss, which is a proportional integral (PI) controller with conditional logic that strives to maintain the correct ratio between single-core and multi-core jobs in an ARC/HTCondor cluster. DrainBoss can be used instead of the HTCondor DEFRAG Daemon.
Go to contribution page
86. A word on phishing

Mr Romain Wartel (CERN)

27/03/2015, 12:45

Miscellaneous

Miscellaneous

... at HEPiX
Go to contribution page
80. Workshop Wrap-Up

Dr Helge Meinhard (CERN)

27/03/2015, 12:50

Miscellaneous

Miscellaneous

Usual summary and conclusions
Go to contribution page