Frederique Chollet
(Centre National de la Recherche Scientifique (FR))
5/19/14, 9:20 AM
Miscellaneous
Logistics
Jose Flix Molina
(Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
5/19/14, 9:30 AM
Site reports
Spring 2014 PIC Tier 1 site report covering recent updates, improving energy efficiency by means of free-coooling tehcniques, and preparation for new challenges.
Brian Paul Bockelman
(University of Nebraska (US))
5/19/14, 9:45 AM
Site reports
The Holland Computing Center at the University of Nebraska-Lincoln hosts the state's research computing resources. There are several grid-enabled clusters available to HEP for opportunistic computing and a CMS Tier-2 site.
In this presentation, we will cover the recent updates to site networking and the CMS Tier-2 cluster. Particular attention will be paid to:
1. The recent rollout of...
Jingyan Shi
(IHEP)
5/19/14, 10:00 AM
Site reports
The site report will give a summary of IHEP site status including local cluster, EGI site. Besides, it will talk about improvement of distribution computing we have done.
Mr
Michel Jouvin
(Universite de Paris-Sud 11 (FR))
5/19/14, 10:15 AM
Site reports
This site report will cover GRIF grid site and LAL internal computing
Manfred Alef
(Karlsruhe Institute of Technology (KIT))
5/19/14, 11:00 AM
Site reports
KIT Site Report
Dr
Keith Chadwick
(Fermilab)
5/19/14, 11:20 AM
Site reports
Fermilab Site Report - Spring 2014 HEPiX.
Dr
Ofer Rind
(BROOKHAVEN NATIONAL LABORATORY)
5/19/14, 11:40 AM
Site reports
A summary of developments at BNL's RHIC/ATLAS Computing Facility since the last HEPiX meeting.
Mr
Alan Silverman
(CERN (retired))
5/19/14, 1:30 PM
End-user IT Services & Operating Systems
SL was announced to the world at the Spring 2004 HEPiX meeting in Edinburgh so it seems a good moment to review its origins and how it became the preferred Linux of most HEP sites.
Karanbir Singh
5/19/14, 1:55 PM
End-user IT Services & Operating Systems
Connie Sieh
(Fermilab)
5/19/14, 2:20 PM
End-user IT Services & Operating Systems
Status of Scientific Linux and Futures
Jarek Polok
(CERN)
5/19/14, 2:45 PM
End-user IT Services & Operating Systems
CERN is maintaining and deploying Scientific Linux CERN since 2004.
In January 2014 CentOS and Red Hat announced joining forces in order to
provide common platform for open source community project needs.
How is this merger affecting plans for future CERN Linux version ?
Alvaro Gonzalez Alvarez
(CERN)
5/19/14, 4:10 PM
End-user IT Services & Operating Systems
The current efforts on the issue tracking and version control services at CERN will be presented. Special attention to the new central git service, the integration between issue tracking and version control and future service deployments.
Sebastien Dellabella
(CERN)
5/19/14, 4:35 PM
End-user IT Services & Operating Systems
In this presentation we will talk about Windows 8 and how we are integrating it at CERN. What are the issues we met and how we solved them for our users community.
We will focus on issues, customization and deployment.
If you have to start a Windows 8 pilot project in your organization, you must be there.
Christopher John Walker
(University of London (GB))
5/19/14, 5:15 PM
Site reports
Site report from Queen Mary University of London.
Yves Kemp
(Deutsches Elektronen-Synchrotron (DE))
5/20/14, 9:40 AM
Site reports
DESY site report for Spring 2014 HEPiX workshop
Tomoaki Nakamura
(University of Tokyo (JP))
5/20/14, 10:00 AM
Site reports
The Tokyo Tier-2, which is located at International Center for Elementary Particle Physics (ICEPP) in the University of Tokyo, was established as a regional analysis center in Japan for the ATLAS experiment. The official operation with WLCG was started in 2007 after the several years’ development since 2002. In December 2012, we have replaced almost all hardware as the third system upgrade to...
Szabolcs Hernath
(Hungarian Academy of Sciences (HU))
5/20/14, 11:25 AM
Site reports
As newcomers to the HEPIX community, WIGNER Datacenter, the newly established scientific computing facility of the WIGNER Research Centre for Physics in Budapest, would like to give an introduction on its background, construction and mission and model of operation. Featuring a long-term sustainable, energy-efficient and high availability infrastructure, WIGNER Datacenter aims to provide a full...
Sandy Philpott
(JLAB)
5/20/14, 11:40 AM
Site reports
The JLab talk will cover our current high performance and experimental physics computing status, including node-sharing between clusters for the 12GeV data challenges, Puppet configuration management plans, our latest GPU and MIC environment, workflow tools, LTO6 integration into the mass storage system, initial results of XFS on Linux testing, and plans for a Lustre 2.5 update and LMDS...
Shawn Mc Kee
(University of Michigan (US))
5/20/14, 11:55 AM
Site reports
I will present an update on our site since the last report and cover our work with dCache, perfSONAR-PS, VMWare and experience with Cobbler and CFengine3 as our node provisioning system. There will also be an overview of our recent networking changes including the status of our new 100G connection to the WAN. I conclude with a summary of what has worked and what problems we encountered...
Sven Sternberger
(D)
5/20/14, 12:10 PM
Basic IT Services
The talk will discuss the problems which arise
from managing and distributing secrets like
root passwords, keytabs, certificates
in a large site.
Secrets are needed in the process of installing and administrating
of compute and storage systems.
They should be accessible by authorized admins and
from the system they belong to. There should be a way to audit the
information to enforce...
Mr
Michel Jouvin
(Universite de Paris-Sud 11 (FR))
5/20/14, 2:00 PM
Computing & Batch Services
WLCG GDB organized a meeting last March about batch systems. With an audience mostly from grid sites, it has been a successful review of the main batch systems used in the community by sites with concrete experience. This presentation will summarize what was presented and the main conclusions of this meeting.
Ian Peter Collier
(STFC - Rutherford Appleton Lab. (GB))
5/20/14, 2:25 PM
Computing & Batch Services
It’s been almost a year since we first started running ATLAS and CMS production jobs at RAL using HTCondor, and 6 months since we fully migrated from Torque/Maui. This talk will discuss our experience so far and future plans.
Janos Daniel Pek
(CERN)
5/20/14, 2:50 PM
Computing & Batch Services
The CERN Batch System is comprised of 4000 worker nodes. 60 queues offer a service for various types of large user communities. In light of the recent developments driven by the Agile Infrastructure and the more demanding processing requirements, the Batch System will be faced with increasingly challenging scalability and flexibility needs. Last year the CERN Batch Team has started to evaluate...
Brian Paul Bockelman
(University of Nebraska (US))
5/20/14, 3:15 PM
Computing & Batch Services
HTCondor is a well known platform for distributed high-throughput computing and often resembles a the Swiss-Army-knife of computing - there's a bit of something for everyone. With a user manual weighing in at about 1,100 printed pages, there's no wonder that sysadmins can overlook some of the most exciting features.
This presentation will be dedicated to uncovering the hidden gems for...
Mr
Daniel Gruber
(Univa)
5/20/14, 4:10 PM
Computing & Batch Services
Current Linux distributions including support for a new kernel enhancement called control groups (cgroups).
This talk is about how Univa Grid Engine integrates the Linux cgroup subsystems for better
resource isolation, utilization, and limitation in the job execution and resource allocation context. Example configurations and use cases for today's NUMA compute nodes are discussed.
Suzanne Poulat
(Centre de calcul IN2P3)
5/20/14, 4:35 PM
Computing & Batch Services
After 20 years using a home made batch system named BQS (Batch Queuing System), CC-IN2P3 decided to move to Grid Engine in order to offer the scalability and robustness needed for multi-experiment production, HEP et non HEP.
The site migrated from BQS to Oracle Grid Engine in 2011, then switched to Univa’s version after only two years, in June 2013.
The talk presents the assessment of...
Nathalie Rauschmayr
(CERN)
5/20/14, 5:00 PM
Computing & Batch Services
Nowadays, the Worldwide LHC Computing Grid consists of multi and manycore CPUs. A lot of work is undertaken by the experiments and the HEP community, in order to use these resources more efficiently. As a result, the parallelization of applications has been the main goal so far in order to allow a parallel execution of jobs. However, experiments must also consider how to schedule multicore...
Marek Elias
(Institute of Physics ASCR (FZU))
5/21/14, 9:00 AM
Security & Networking
At FZU we are continuing with deployment of IPv6 in our testbed as well
as the production network. On dual stack, we are currently running
several subclusters of worker nodes and our DPM storage system.
Production data transfers from DPM to dualstack worker nodes using
lcg-cp are currently running via IPv6. We present our experience with
this deployment, new nagios sensors needed in this...
Christopher John Walker
(University of London (GB))
5/21/14, 9:25 AM
Security & Networking
IPv6 rollout at UK sites varies from one site where nearly all services are dual stack (Imperial), to others without any IPv6 addresses. The current rollout status will be presented. In addition, results of IPv6 connectivity testing using perfsonar will be discussed.
Shawn Mc Kee
(University of Michigan (US))
5/21/14, 9:50 AM
Security & Networking
As reported at the last HEPiX meeting, the WLCG has been supporting the deployment of perfSONAR-PS Toolkit instances at all WLCG sites over the last year. The WLCG perfSONAR-PS Deployment Task Force has now wrapped up its work in April 2014.
The perfSONAR network monitoring framework was evaluated and agreed as a proper solution to cover the WLCG network monitoring use cases: it allows...
Eileen Kuhn
(KIT - Karlsruhe Institute of Technology (DE))
5/21/14, 10:15 AM
Security & Networking
Batch system monitoring and related system monitoring tools allow tracking data streams at different levels. With the introduction of federated data access to the workflows of WLCG it is becoming increasingly important for data centers to understand specific data flows regarding storage element accesses, firewall configurations, or the scheduling of workflows themselves. For this purpose a...
Vincent Brillault
(CERN)
5/21/14, 11:10 AM
Security & Networking
The Emergency suspension list (also known as central banning list) is finally getting deployed in WLCG, allowing quick automated responses to incidents. This short presentation will present the goal of this new features, the technology behind this system and details about the current deployment.
joel surget
(CEA/Saclay)
5/21/14, 11:25 AM
Security & Networking
In 2013/2014 the CEA has decided to change dramatically the security of the Windows PC and the way to manage them. I’ll explain the new philosophy of the security based on two levels:
- Lateral security
- Escalade security
I’ll explain the problematic for the end-users and also for the IT team.
Vincent Brillault
(CERN)
5/21/14, 11:50 AM
Security & Networking
This presentation provides an update of the security landscape since the last meeting. It describes the main vectors of compromises in the academic community and presents interesting recent attacks. It also covers security risks management in general, as well as the security aspects of the current hot topics in computing, for example identity federation and virtualisation.
Olof Barring
(CERN)
5/21/14, 2:00 PM
IT Facilities & Business Continuity
The Open Compute Project, OCP (http://www.opencompute.org/), was launched by Facebook in 2011 with the objective of building efficient computing infrastructures at lowest possible cost. The technologies are released as open hardware design, with the goal to develop servers and data centers following the model traditionally associated with open source software projects. In order to try out the...
Mr
Michel Jouvin
(Universite de Paris-Sud 11 (FR))
5/21/14, 2:25 PM
IT Facilities & Business Continuity
As presented at past HEPiX, 8 labs in Orsay region/university started 2 years ago a project to build a new datacenter aimed to replace the existing inefficient computing rooms. This project has been delivered on-time and is in production since last October.
This presentation will summarize the needs that motivated the project, the design choices, the building phase experience and gives an...
Szabolcs Hernath
(Hungarian Academy of Sciences (HU))
5/21/14, 2:50 PM
IT Facilities & Business Continuity
In this talk we would like to give a summary on the experiences of the first year of opeartion of the WIGNER Datacenter. We will discuss the topics of infrastructure operations, facility management, energy efficiency and value added hosting services, with a special focus on the CERN@WIGNER project, the hosting of the external capacity of CERN Tier-0 resources. We will highlight some of the...
Andrea Chierici
(INFN-CNAF)
5/21/14, 3:15 PM
IT Facilities & Business Continuity
In march we had a major cooling problem in our computing center and we had to completely shut the center down.
We learnt a lot from this problem and would like to share the experience within the community.
Mr
Peter van der Reest
(DESY),
Yves Kemp
(DESY)
5/21/14, 4:10 PM
IT Facilities & Business Continuity
a collection of themes and thoughts on BC, covering among others measures, procedures and dependencies
Dr
Tony Wong
(Brookhaven National Laboratory)
5/21/14, 4:35 PM
Computing & Batch Services
The RACF has evaluated the Intel Ivybridge and AMD Opteron cpu's before an anticipated purchase of Linux servers for its RHIC and USATLAS programs in 2014. Price performance considerations are no longer sufficient as we must consider long-term power, cooling and space capacities in the data center. This presentation describes how these long-term considerations are increasingly altering...
Manfred Alef
(Karlsruhe Institute of Technology (KIT))
5/21/14, 5:00 PM
Computing & Batch Services
The HEPiX Benchmarking Working Group is preparing for the deployment of a successor of the widely used HS06 benchmark.
- Why we are looking for a replacement of HS06
- Summary of discussions at GDB
- Requirements
- Benchmark candidates
- Volunteers
Andrea Chierici
(INFN-CNAF)
5/21/14, 5:25 PM
Computing & Batch Services
At INFN-T1 we are facing the problem of TCO of computing nodes, which count for the bigger part of our electricity bill.
Intel recently introduced the Avoton SOC, targeted on the microserver, entry communication infrastructure and cloud storage market. We benchmarked this CPU and evaluated the possible adoption of this technology in our computing farm.
Max Fischer
(KIT - Karlsruhe Institute of Technology (DE))
5/22/14, 9:00 AM
Storage & Filesystems
Modern data processing solutions increasingly rely on data locality to achieve high data access rates and scalability. In contrast the common HEP system architectures emphasis uniform resource pools with minimal locality, allowing even for cross-site data access. The concept for the new High Performance Data Analysis (HPDA) Tier3 at KIT aims at introducing data locality to HEP batch systems....
Dr
Daniel van der Ster
(CERN)
5/22/14, 9:25 AM
Storage & Filesystems
Ceph was introduced at CERN in early 2013 as a potential solution to new use-cases (e.g. cloud block storage) while also providing a path toward a consolidated storage backend for other services including AFS, NFS, etc...
This talk will present the outcome of the past year of testing and production experience with Ceph. We will present our real operations experience and lessons-learned, and...
George Ryall
(STFC)
5/22/14, 9:50 AM
Storage & Filesystems
We are trialling the use of Ceph both as a file-system and as a cloud storage back end. I will present our experiences so far.
German Cancio Melia
(CERN)
5/22/14, 10:15 AM
Storage & Filesystems
CERN stores over 100PB of data on tape via CASTOR and TSM. This talk will present the current status of the CERN tape infrastructure, with a particular focus on tape performance and efficiency and the status of the large media repacking exercise.
Mr
Peter van der Reest
(DESY)
5/22/14, 11:10 AM
Storage & Filesystems
DESY -IT- has implemented a cloud storage service on the basis of dCache.
The talk will describe architecture and service concepts.
German Cancio Melia
(CERN)
5/22/14, 11:35 AM
Storage & Filesystems
In this talk, we will provide an update of bit-level preservation WG activities, notable on the ongoing work on a set of recommendations and on a model for estimating long-term (10-20-30 years) archiving cost outlooks.
Mr
Sylvain Reynaud
(CNRS)
5/22/14, 12:00 PM
Basic IT Services
Many of us need tools for service monitoring adapted to our site specificities, or tools to do custom processing on user data. Regardless of the use-cases, we have to develop (or get developed) applications that aggregate, process and format data from heterogeneous data sources.
Lavoisier (http://software.in2p3.fr/lavoisier) is a framework, which enables building such applications by...
Pedro Andrade
(CERN)
5/22/14, 2:00 PM
Basic IT Services
The agile infrastructure monitoring team is working on new solutions to modernise and improve how monitoring and analytics is done at CERN. We will give an update on these activities, in particular the recent progress on testing and adopting different open source technologies (e.g. hadoop, elasticsearch, flume, kibana) for the various monitoring architecture layers. We will report on the...
Ben Jones
(CERN)
5/22/14, 2:25 PM
Basic IT Services
As the Agile Infrastructure project scaled from being a development effort of a few peoplethat could sit in one meeting room to a production service for the CERN computer centre, what changes were needed to our service, tools and workflow? We will look at the technical challenges scaling the puppet infrastructure, scaling the development
effort of puppet code, and also the procedural changes...
Stefano Zilli
(CERN)
5/22/14, 2:50 PM
Basic IT Services
The CERN private cloud has been in production since July 2013 and has grown steadily to 60000 cores, hosting more than 5500 Virtual Machines for 370 users and 140 shared projects. New features have been made available this year like block storage and IPv6. This presentation will provide an overview of the current status of the infrastructure and of the plans for the next developments and...
Jerome Belleman
(CERN)
5/22/14, 3:15 PM
Basic IT Services
As the Agile Infrastructure is moving forwards at CERN, more and more services are migrating to it. New tools are put in place to get the most out of its strengths while we learn lessons from the problems we hit when converting services from Quattor. At CERN, a number of services have made some significant progress in the migration to the new infrastructure; the batch service, several...
Mr
Frederic Schaer
(CEA)
5/22/14, 4:10 PM
Basic IT Services
The IRFU site, member of the GRIF wLCG T2 site, decided to move from quattor to puppet in 2012. The migration was almost complete early april 2014.
This talk will focus mainly on the goals, the ways to achieve what we achieved, the manpower that was required, what we gained with puppet and the new challenges that we must now face as a T2 with this management tool.
Ian Peter Collier
(STFC - Rutherford Appleton Lab. (GB))
5/22/14, 4:35 PM
Basic IT Services
A report on the status of the Quattor toolset, with particular emphasis on recent developments in both teh user and development communities.
Ben Jones
(CERN)
5/22/14, 5:00 PM
Basic IT Services
A year ago we began working with other sites to see how we could best share knowledge and effort amongst sites migrating to puppet. This talk will present a reminder of the working group, it's formation and mandate, and how puppet had been developing already amongst earlier adopters. We will discuss how puppet module development occurs in the wider puppet community, and what conventions the...
Larry Pezzaglia
(LBNL)
5/22/14, 5:25 PM
Basic IT Services
This talk will provide a case study of cluster consolidation at NERSC.
In 2012, NERSC began deployment of "Mendel", a 500+ node,
Infiniband-attached, Linux "meta-cluster" which transparently
expands NERSC production clusters and services in a scalable and
maintainable fashion. The success of the software automation
infrastructure behind the Mendel multi-clustering model...
Mr
Andrey SHEVEL
(University of Information Technology, Mechanics, and Optics)
5/23/14, 9:00 AM
Grid, Cloud & Virtualisation
In many cases where Big Data phenomenon is taken place there is the need to transfer the Big Data from one point of computer network to another point. Quite often those points are far away from each other. The transfer time is significant factor to transfer the Big Data. During this time the features of the data link might be changed drastically including interruptions of channel operation...
Steven Timm
(Fermilab)
5/23/14, 9:25 AM
Grid, Cloud & Virtualisation
The FermiCloud project exists to provide on-demand computing and data movement services to the various experiments at Fermilab. We face a dynamically changing demand for compute resources and data movement, which we meet by enabling users to run on our own site, remote grid sites, and cloud sites. We also instantiate on-demand data movement and web caching services to support this remote...
Ian Peter Collier
(STFC - Rutherford Appleton Lab. (GB))
5/23/14, 9:50 AM
Grid, Cloud & Virtualisation
The RAL Tier 1 is now working deploying a production quality private cloud, to meet the emerging needs of both the Tier 1 and STFCs Scientific Computing Department. This talk will describe the work so far and the roadmap for the coming year. We will also discuss other virtualisaton developments.
Dr
Domenico Giordano
(CERN)
5/23/14, 10:15 AM
Grid, Cloud & Virtualisation
Helix Nebula – the Science Cloud is a European public-private-partnership between leading scientific research organisations (notably CERN, EMBL and ESA) and European IT cloud providers. Its goal is to establish a Cloud Computing Infrastructure for the European Research Area and the Space Agencies, serving as a platform for innovation and evolution of a federated cloud framework for...
Andrew McNab
(University of Manchester (GB))
5/23/14, 11:10 AM
Grid, Cloud & Virtualisation
We present experiences with running ATLAS and LHCb production jobs in virtual machines at Manchester and other sites in the UK using Vac. Vac is a self-contained VM management system in which individual hypervisor hosts act as VM factories to provide VMs contextualized for experiments, and offers an alternative to conventional CE/Batch systems and Cloud interfaces to resources. In the Vacuum...