Brian Paul Bockelman
(University of Nebraska (US))
13/10/2014, 09:00
Sean Crosby
(University of Melbourne (AU))
13/10/2014, 09:15
Site reports
An update on the ATLAS Tier 2 and distributed Tier 3 of HEP groups in Australia. Will talk about our integration of Cloud resources, Ceph filesystems and integration of 3rd party storage into our setup
Peter Gronbech
(University of Oxford (GB))
13/10/2014, 09:30
Site reports
Site report from the University of Oxford Physics department.
Erik Mattias Wadenstein
(University of Umeå (SE)),
Ulf Tigerstedt
(CSC Oy)
13/10/2014, 09:45
Site reports
Site report for NDGF-T1, mainly focusing on dCache.
Sandy Philpott
(JLAB)
13/10/2014, 10:00
Site reports
An overview since our spring meeting on JLab's latest developments for 12 GeV physics computing and storage, Lustre update, openZFS plan, load balancing between HPC and data analysis, Facilities changes in the Data Center, ...
Andreas Petzold
(KIT - Karlsruhe Institute of Technology (DE))
13/10/2014, 10:15
Site reports
News about GridKa Tier-1 and other KIT IT projects and infrastructure.
Ajit mohapatra
(University of Wisconsin (US)),
Tapas Sarangi
(University of Wisconsin (US))
13/10/2014, 11:15
Site reports
As a major WLCG/OSG T2 site, the University of Wisconsin Madison CMS T2 has provided very productive and reliable services for CMS MonteCarlo production/processing, and large scale global CMS physics analysis using high throughput computing (HT-Condor), highly available storage system (Hadoop), efficient data access using xrootd/AAA, and scalable distributed software systems (CVMFS). An update...
Pat Riehecky
(Fermilab)
13/10/2014, 13:30
End-User IT Services & Operating Systems
This presentation will provide an update on the current status of Scientific Linux, descriptions for some possible future goals, and allow a chance for users to provide feedback on its direction.
Thomas Oulevey
(CERN)
13/10/2014, 14:00
End-User IT Services & Operating Systems
CERN is maintaining and deploying Scientific Linux CERN since 2004.
In January 2014 CentOS and Red Hat announced joining forces in order to provide common platform for open source community project needs.
CERN decided to see how CentOS 7 fits his needs and evaluate CentOS release 7 as their next version.
An updated report will be provided, as agreed at HEPiX Spring 2014.
Garhan Attebury
(University of Nebraska (US))
13/10/2014, 14:30
End-User IT Services & Operating Systems
Seven years have passed since the initial EL 5 release and yet it's still found in active use at many sites. The successor EL 6 is also showing age with its 4th birthday just around the corner. While both are still under support from RedHat for many years to come, it never hurts to prepare for the future.
This talk will detail the experiences at T2_US_Nebraska in transitioning towards EL 7...
Michail Salichos
(CERN)
13/10/2014, 15:00
End-User IT Services & Operating Systems
FTS3 is the service responsible for globally distributing the majority
of the LHC data across the WLCG infrastructure. It is a file transfer
scheduler which scales horizontally and it's easy to install and
configure. In this talk we would like to bring the attention to the FTS3
features that could attract wider communities and administrators with
several new friendly features. We...
Borja Aparicio Cotarelo
(CERN)
13/10/2014, 16:00
End-User IT Services & Operating Systems
The current efforts around the Issue Tracking and Version Control services at CERN will be presented. Their main design and structure will be shown giving special attention to the new requirements from the community of users in terms of collaboration and integration tools and how we address this challenge in the definition of new services based on GitLab for collaboration and Code Review and...
Andrea Chierici
(INFN-CNAF)
14/10/2014, 09:00
End-User IT Services & Operating Systems
CNAF T1 monitoring and alarming systems produce tons of data describing state, performance and usage of our resources. Collecting this kind of information centrally would benefit both resource administrators and our user community in processing information and generating reporting graphs.
We built the “Monviso reporting portal” that consumes a set of key metrics, graphing them based on two...
Mr
Frederic Schaer
(CEA)
14/10/2014, 09:45
Site reports
In this site report, we will speak about what changed at CEA/IRFU and what has been interesting since Hepix@Annecy, 6 months ago.
Andreas Haupt
(Deutsches Elektronen-Synchrotron (DE))
14/10/2014, 10:00
Site reports
News from DESY since the Annecy meeting.
James Pryor
(B)
14/10/2014, 10:15
Site reports
A summary of developments at BNL's RHIC/ATLAS Computing Facility since the last HEPiX meeting.
Shawn Mc Kee
(University of Michigan (US))
14/10/2014, 11:00
Site reports
I will present an update on our site since the last report and cover our work with dCache, perfSONAR-PS and VMWare. I will also report on our recent hardware purchases for 2014 as well as the status of our new networking configuration and 100G connection to the WAN. I conclude with a summary of what has worked and what problems we encountered and indicate directions for future work.
Dr
Helge Meinhard
(CERN)
14/10/2014, 11:30
Grid, Cloud & Virtualisation
[LHC@home][1] was brought back to CERN-IT in 2011, with 2 projects; Sixtrack and Test4Theory, the latter using virtualization with CernVM. Thanks to this development, there is increased interest in volunteer computing at CERN, notably since native virtualization support has been added to the BOINC middleware. Pilot projects with applications from the LHC experiment collaborations running on...
Michele Michelotto
(Universita e INFN (IT))
14/10/2014, 13:30
Computing & Batch Services
The traditional architecture for High Energy Physics is x86-64 but in the community there is interest in processor more efficient in term of computing power per Watt. I'll show my
measurement on ARM and Avoton processor.
I'll conclude with some measurements on candidate for fast benchmark that are requested by the physics community, mostyl to measure the performance of machine in cloud.
Jerome Belleman
(CERN)
14/10/2014, 14:00
Computing & Batch Services
The CERN Batch System comprises 4000 worker nodes, 60 queues and offers
a service for various types of large user communities. In light of the
developments driven by the Agile Infrastructure and the more demanding
processing requirements, it will be faced with increasingly challenging
scalability and flexibility needs.
Last HEPiX, we presented the results of our evaluation of SLURM,...
Samir Cury Siqueira
(California Institute of Technology (US))
14/10/2014, 14:30
Computing & Batch Services
Hardware benchmarks are often relative to the target application. In CMS sites, new technologies, mostly processors, need to be evaluated on an yearly basis. A framework was developed at the Caltech CMS Tier-2 to benchmark compute nodes with one of the most CPU-intensive CMS workflows - The Tier-0 Reconstruction.
The benchmark is a CMS job that reports the results to a central database...
Todd Tannenbaum
(Univ of Wisconsin-Madison, Wisconsin, USA)
14/10/2014, 15:00
Computing & Batch Services
The goal of the HTCondor team is to to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Increasingly, the work performed by the HTCondor developers is being driven by its partnership with the High Energy Physics (HEP) community. This presentation will provide an...
Brian Paul Bockelman
(University of Nebraska (US))
14/10/2014, 15:20
Computing & Batch Services
One of the most critical components delivered by the Open Science Grid (OSG) software team is the compute element, or the OSG-CE. At the core of the CE itself is the gatekeeper software for translating grid pilot jobs into local batch system jobs. OSG is in the process of migrating from the Globus gatekeeper to the HTCondor-CE, supported by the HTCondor team.
The HTCondor-CE provides an...
James Frey
14/10/2014, 16:10
Computing & Batch Services
An important use of HTCondor is as a scalable, reliable interface for
jobs destined for other scheduling systems.
These include Grid intefaces to batch systems (Globus, CREAM, ARC) and
Cloud services (EC2, OpenStack, GCE).
The High Energy Physics community has been a major user of this
functionality and has driven its development.
This talk will provide an overview of HTCondor's Grid...
Mr
Alexandr Zaytsev
(Brookhaven National Laboratory (US))
14/10/2014, 16:30
Security & Networking
The Infiniband networking technology is a long established and rapidly developing technology which is currently dominating the field of low-latency, high-throughput interconnects for HPC systems in general and those included in the TOP-500 list in particular. Over the last 4 years a successful use of Infiniband networking technology combined with additional IP-over-IB protocol and Infiniband...
Dave Kelsey
(STFC - Rutherford Appleton Lab. (GB))
15/10/2014, 09:00
Security & Networking
This talk will present an update on the recent activities of the HEPiX IPv6 Working Group including our plans for moving to dual-stack services on WLCG.
Joe Metzger
(LBL)
15/10/2014, 09:30
Security & Networking
The ESnet Extension to Europe (EEX) project is building out the ESnet backbone in to Europe. The goal of the project is to provide dedicated transatlantic network services that support U.S. DOE funded science.
The EEX physical infrastructure build will be substantially completed before the end of December. Initial services will be provided to BNL, FERMI and CERN while the infrastructure is...
Bob Cowles
(BrightLite Information Security)
15/10/2014, 10:00
Security & Networking
After several years investigation of trends in Identity Management (IdM), the eXtreme Scale Identity Management (XSIM) project has concluded there is little reason for resource providers to provide IdM functions for research collaborations or even for many groups within the institution. An improved user experience and decreased cost can be achieved with "a small amount of programming."
Robert Quick
(Indiana University)
15/10/2014, 11:00
Security & Networking
OSG Operations and Software will soon be configuring our operational infrastructure and middleware components with an IPv6 network stack capabilities in addition to its existing IPv4 stack. For OSG services this means network interfaces will thus have at least one IPv6 address on which it listens, in addition to whatever IPv4 addresses it is already listening on. For middleware components we...
Dr
Stefan Lueders
(CERN)
15/10/2014, 11:30
Security & Networking
Computer security is important as ever outside the HEP community, but also within. This presentation will give the usual overview on recent issues being reported or made public since the last HEPix workshop (like the ripples of "Heartbleed"). It will discuss trends (identity federation and virtualisation) and potential mitigations to new security threats.
Dr
Tony Wong
(Brookhaven National Laboratory)
15/10/2014, 13:30
IT Facilities & Business Continuity
We describe a cost-effective indirect UPS monitoring system that was implemented recently in parts of its RACF complex. This solution was needed to address a lack of centralized monitoring solution, and it is integrated with an event notification mechanism and overall facility management.
Wayne Salter
(CERN)
15/10/2014, 14:00
IT Facilities & Business Continuity
After a tender for a CERN remote Tier0 centre issued at the end of 2011, and awarded to the Wigner Data Centre in May 2012, operations commenced at the beginning of 2013. This talk will give a brief introduction to the history of this project and it scope. It will then summarise the initial experience that has been gained to-date and highlight a number of issues that have been encountered;...
Dr
Dimitri Bourilkov
(University of Florida (US))
15/10/2014, 14:30
Storage & Filesystems
Design, performance, scalability, operational experience, monitoring,
different modes of access and expansion plans for the Lustre
filesystems, deployed for high performance computing at the University
of Florida, are described. Currently we are running storage systems of
1.7 petabytes for the CMS Tier2 center and 2.0 petabytes for the
university-wide HPC center.
Luca Mascetti
(CERN)
15/10/2014, 15:00
Storage & Filesystems
In this contribution we report our experience in operating EOS, the CERN-IT high-performance disk-only solution, in multiple Computer Centres. EOS is one of the first production services exploiting the CERN's new facility located in Budapest, using his stochastic geo-location of data replicas.
Currently EOS holds more than 100PB of raw disk space for the four big experiments (ALICE, ATLAS,...
Jeffrey Dost
(UCSD)
15/10/2014, 16:00
Storage & Filesystems
We have developed an XRootD extension to Hadoop at UCSD that allows a site to significantly free local storage space by taking advantage of the file redundancy already provided by the XRootD Federation. Rather than failing when a corrupt portion of a file is accessed, the hdfs-xrootd-fallback system retrieves the segment from another site using XRootD, thus serving the original file to the end...
Luca Mascetti
(CERN)
15/10/2014, 16:30
Storage & Filesystems
Cernbox is a cloud synchronization service for end-users: it allows to sync and share files on all major platforms (Linux, Windows, MacOSX, Android, iOS). The very successful beta phase of the service demonstrated high demand in the community for such easily accessible cloud storage solution. Integration of Cernbox service with the EOS storage backend is the next step towards providing sync...
Dr
Arne Wiebalck
(CERN)
15/10/2014, 17:00
Grid, Cloud & Virtualisation
This is summary of our efforts to address the issue of providing sufficient IO capacity to VMs running in our OpenStack cloud.
Brian Behlendorf
(LLNL)
16/10/2014, 09:00
Storage & Filesystems
OpenZFS is a storage platform that encompasses the functionality of a traditional filesystem and volume manager. It's highly scalable, provides robust data protection, supports advanced features like snapshots and clones, and is easy to administer. These features make it an appealing choice for HPC sites like LLNL which uses it for all production Lustre filesystems.
This contribution...
Liviu Valsan
(CERN)
16/10/2014, 09:30
Storage & Filesystems
Flash storage is slowly becoming more and more prevalent in the High Energy Physics community. When deploying Solid State Drives (SSDs) it's important to understand their capabilities and limitations, allowing to choose the best adapted product for the use case at hand. Benchmarking results from synthetic and real-world workloads on a wide array of Solid State Drives will be presented. The new...
Mr
Alexandr Zaytsev
(Brookhaven National Laboratory (US))
16/10/2014, 10:00
Storage & Filesystems
Ceph based storage solutions are becoming increasingly popular within the HEP/NP community over the last few years. With the current status of Ceph project, both object storage and block storage layers are production ready on a large scale, and the Ceph file system storage layer (CephFS) is rapidly getting to that state as well. This contribution contains a thorough review of various...
Andrea Manzi
(CERN)
16/10/2014, 11:00
Storage & Filesystems
In this contribution we give a set of hints for the performance tuning of the upcoming DPM releases, and we show what one can achieve by looking at different graphs taken from the DPM nightly performance tests.
Our focus is on the HTTP/WebDAV and Xrootd protocols and the newer "dmlite" software framework, and some of these hints may give some benefit also to older, legacy protocol...
Ruben Domingo Gaspar Aparicio
(CERN)
16/10/2014, 11:20
Storage & Filesystems
CERN IT-DB group is migrating its storage platform, mainly NetApp NAS’s running on 7-mode but also SAN arrays, to a set of NetApp C-mode clusters. The largest one is made of 14 controllers and it will hold a range of critical databases from administration to accelerators control or experiment control databases. This talk shows our setup: network, monitoring, use of features like transparent...
Tony Quan
(LBL)
16/10/2014, 11:40
Storage & Filesystems
The PDSF Cluster at NERSC has been providing a data-intensive computing resource for experimental high energy particle and nuclear physics experiments (currently Alice, ATLAS, STAR, ICECUBE, MAJORANA) since 1996. Storage is implemented as a GPFS cluster built out of a variety of commodity hardware (Dell, Raidinc, Supermicro storage and servers). Recently we increased its capacity by 500TB by...
Ray Spence
(u)
16/10/2014, 13:30
Basic IT Services
Lawrence Berkeley National Laboratory/NERSC Division
Developing Nagios code to suspend checks during planned outages.
Raymond E. Spence
NERSC currently supports more than 13,000 computation nodes spread over six supercomputing or clustered systems. These systems access cumulatively more than 13.5PB of disk space via thousands of network interfaces. This environment enables scientists...
Ben Jones
(CERN)
16/10/2014, 13:50
Basic IT Services
A status of the Puppet-based Configuration Service at CERN will presented giving a general update and discussing our current plans for the next 6 months.
The presentation will also highlight the work being done to secure the Puppet infrastructure making it appropriate for use by a large number of administratively distinct user-groups.
Mr
Ben Meekhof
(University of Michigan)
16/10/2014, 14:15
Basic IT Services
CFEngine is a highly flexible configuration management framework. It
also has a very high learning curve which can sometimes make decisions
about how to deploy and use it difficult. At AGLT2 we manage a
variety of different systems with CFEngine. We also have an effective
version-controlled workflow for developing, testing, and deploying
changes to our configuration. The talk will...
Timothy Michael Skirvin
(Fermi National Accelerator Lab. (US))
16/10/2014, 14:40
Basic IT Services
USCMS-T1's work to globally deploy Puppet as our configuration management tool is well into the "long tail" phase, and has changed in fairly significant ways since its inception. This talk will discuss what has worked, how the Puppet tool itself has changed over the project, and our first thoughts as to what we expect to be doing in the next year (hint: starting again is rather likely!).
Ruben Domingo Gaspar Aparicio
(CERN)
16/10/2014, 15:05
Basic IT Services
Inspired on different database as a service, DBaas, providers, the database group at CERN has developed a platform to allow CERN user community to run a database instance with database administrator privileges providing a full toolkit that allows the instance owner to perform backup/ point in time recoveries, monitoring specific database metrics, start/stop of the instance and...
Wayne Salter
(CERN)
16/10/2014, 16:00
IT Facilities & Business Continuity
The presentation describes options for joint activities around procurement of equipment and services by public labs, possibly with funding by the European Commission. The
presentation is intended to inform the community and check whether there is interest.
James Pryor
(B)
16/10/2014, 16:20
Basic IT Services
In 2010, the RACF at BNL began investigating Agile/DevOps practices and methodologies to be able to do more in less time or effort. We choose Puppet in 2010 and by Spring of 2011 we had converted about half our of configuration shell scripts into Puppet code on a handful of machines. Today we have scaled Puppet 3.x to support our entire facility and and host a common Puppet code base that is...
Aris Angelogiannopoulos
(Ministere des affaires etrangeres et europeennes (FR))
16/10/2014, 16:45
Basic IT Services
This presentation describes the implementation and use cases of the Ermis Service. Ermis is a RESTful service to manage the configuration of DNS load balancers. It enables direct creation and deletion of DNS delegated zones using a SOAP interface provided by the Network group thus simplifying the procedure needed for supporting new services. It is written in Python as a Django Application....
Ian Peter Collier
(STFC - Rutherford Appleton Lab. (GB))
17/10/2014, 09:00
Grid, Cloud & Virtualisation
Update on the RAL Tier 1 cloud deployment and cloud computing activities.
Laurence Field
(CERN)
17/10/2014, 09:30
Grid, Cloud & Virtualisation
The adoption of cloud technologies by the LHC experiments is currently focused on IaaS, more specifically the ability to dynamically create virtual machines on demand.
This talk provides an overview of how this alternative approach for resource provision fits into the existing workflows used by the experiments.
It shows that in order to fully exploit this approach, solutions are required in...
Dr
Edward Karavakis
(CERN)
17/10/2014, 10:00
Grid, Cloud & Virtualisation
The WLCG monitoring system provides a solid and reliable solution that has supported LHC computing activities and WLCG operations during the first years of LHC data-taking. The current challenge consists of ensuring that the WLCG monitoring infrastructure copes with the constant increase of monitoring data volume and complexity (new data-transfer protocols, new dynamic types of resource...
Dr
Arne Wiebalck
(CERN)
17/10/2014, 10:30
Grid, Cloud & Virtualisation
This is a report on the current status of CERN's OpenStack-based Cloud Infrastructure.