HEPiX Fall 2011 Workshop
→
Canada/Pacific
Hosted by TRIUMF, SFU and the University of Victoria at the Harbour Center - Downtown Vancouver
Hosted by TRIUMF, SFU and the University of Victoria at the Harbour Center - Downtown Vancouver
515 West Hastings Street
Vancouver, BC
Canada V6B 5K3
Michel Jouvin
(LAL / IN2P3),
Sandy Philpott
(JLAB),
Steven McDonald
(TRIUMF)
Description
HEPiX meetings bring together IT system support engineers from the High Energy Physics (HEP) laboratories, institutes, and universities, such as BNL, CERN, DESY, FNAL, IN2P3, INFN, JLAB, NIKHEF, RAL, SLAC, TRIUMF and others.
Meetings have been held regularly since 1991, and are an excellent source of information for IT specialists in scientific high-performance and data-intensive computing disciplines. We welcome participation from related scientific domains for the cross-fertilization of ideas.
The hepix.org website provides links to information from previous meetings.
-
-
Welcome
-
Site Reports
- 1
-
2
PDSF Site ReportPDSF is a networked distributed computing cluster designed primarily to meet the detector simulation and data analysis requirements of Physics, Astrophysics and Nuclear Science collaborations. Located at NERSC and benefiting from excellent network and storage infrastructure cluster constantly changes to keep up with computing requirements of physics experiments like ATLAS, Alice and Daya Bay. Over the past year - as it is usual to us - we replaced a big fraction of hardware (both computing and storage), moved towards diskless installs with xCAT and were deploying and supporting in addition to GPFS, also XRootD and CVMFS. We also considered migrating to a new batch system and we are going to share our motivation behind the decision to continue with the Univa Grid Engine.Speaker: Iwona Sakrejda
- 3
- 4
-
10:30
Coffee Break
-
Site ReportsConveners: Mr Alan Silverman (CERN), Philippe Olivero (Unknown), philippe olivero (CC-IN2P3)
- 5
- 6
- 7
-
8
ATLAS Great Lakes Tier-2 Site ReportWe will report on the ATLAS Great Lakes Tier-2 (AGLT2), one of five US ATLAS Tier-2 sites, providing a brief overview of our experiences planning, deploying, testing and maintaining our infrastructure to support the ATLAS distributed computing model. AGLT2 is one of the larger WLCG Tier-2s worldwide with 2.2 PB of dCache storage and 4500 job-slots, so we face a number of challenges in monitoring, managing and maintaining our site. Many of those challenges are related to storage, data-management and I/O capabilities. As part of this report we will focus on our recent work in updating, configuring and monitoring our storage systems. In addition to describing new hardware like SSDs and multi-10GE storage nodes we will report on using such tools as pCache and LSM (Local Site Mover) and a new "site-aware" dCache configuration which have helped to remove some bottlenecks in our infrastructure. Because AGLT2 utilizes a central syslog host, we are able to track the behavior of all our worker nodes in staging files in and out via LSM logging. We have constructed a system based upon a custom-built MySQL database which tracks our local resources and merges in information from the central syslog host and the dCache billing DB to allow us to better understand and optimize our site's storage system behaviors. The last part of our report will show some results from using this new system.Speaker: Dr Shawn McKee (University of Michigan ATLAS Group)
- 9
- 10
-
IT Infrastructure
-
11
CERN Computing Facilities Evolution UpdateThere are a number of projects currently underway to improve and extend the CERN computing facilities which have been reported at previous HEPiX meetings. An update will be given on the current status of these projects and particular emphasis will be placed on efficiency improvements that have been made in the CERN Computer Centre and the resulting energy, and hence cost, savings.Speaker: Wayne Salter (CERN)
-
12
Deska: Maintaining your computing centerThe proposed talk discusses the Deska project [1], our attempt at delivering an inventory database whose goal is to provide a central source of machine-readable information about one's computing center. We mention the motivation behind the project, describe the design choices we have made and talk about how the Deska system could help reduce maintenance effort on other sites.Speaker: Jan Kundrat (Unknown-Unknown-Unknown)
- 13
-
11
-
15:30
Coffe Break
-
IT InfrastructureConvener: Dr Helge Meinhard (CERN)
-
14
Secure file storage and transferSINDES, Secure INformation DElivery System, is a tool aimed at ensuring enough level of privacy in storing and delivering confidential files. Initially written at CERN in 2005, SINDES is now being rewritten in order to have its user-interface, flexibility and maintainability improved: access control granularity, logging, file modifications, history, machine upload, unattended installations and support for different OS, are examples of points that are improved in the new version, which is based on Kerberos authentication.Speaker: Veronique Lefebure (CERN)
-
15
Usage of OCS Inventory for Hardware and Software Inventory at CERNCERN has started to use OCS Inventory for the HW and SW inventory of SLC nodes on site, and plans to do the same for the MacOS nodes. I will report about the motivation for this, the setup used and the experience gained.Speaker: Matthias Schroeder (CERN)
-
14
-
-
-
ComputingConvener: Dr Michele Michelotto (Univ. + INFN)
-
16
AMD's New 16-core Opteron ProcessorsAn overview of the architecture and power efficiency features of the latest 16-core processors from AMD including benchmark results for the HEP-SPEC suite –- showing the performance improvements over the current 12-core and older 6-, quad-, and dual-core processors. AMD's newest Opteron processors feature the “Bulldozer” x86 core pair compute module which is especially well-suited for modern C++ and object-based language workloads.Speaker: David (John) Cownie (AMD)
-
16
-
Site Reports
-
17
Prague Tier2 site reportMain computing and storage facilities for LHC computing in the Czech Republic are situated at Prague Tier-2 site. We participate in grid activities since the beginning of European Data Grid. The recent years were significant in the growth of our computing and storage capacities. In this talk, we will present the current state of our site, its history and plans for the near future.Speaker: Jan Svec (Acad. of Sciences of the Czech Rep. (CZ))
- 18
-
17
-
10:30
Coffee Break
-
Site ReportsConveners: Mr Alan Silverman (CERN), Philippe Olivero (Unknown), philippe olivero (CC-IN2P3)
-
19
GRIF, LAL (Orsay) and IRFU (Saclay) site reportSite report of GRIF/LAL and GRIF/IrfuSpeakers: Michel Jouvin (Universite de Paris-Sud 11 (FR)), Pierrick Micout (CEA)
- 20
- 21
- 22
- 23
-
19
-
ComputingConvener: Dr Michele Michelotto (Univ. + INFN)
- 24
-
25
Open discussion on HEP-SPEC06The HEP-SPEC06 benchmark was designed by a working group born during the HEPIX meeting in JLAB. The HS06 is now the standard for measuring computing power in HEP and also in other scientific areas that make use of Computing GRID. The goal of this discussion is to understand how the HEPIX community sees the future of HS06.Speaker: Dr Michele Michelotto (INFN Padua & CMS)
-
IT InfrastructureConvener: Dr Helge Meinhard (CERN)
-
26
Configuration Management at GSIGSI is successfully utilizing Cfengine for configuration management since almost a decade. Even though Cfengine is powerful as well as reliable we started to test the configuration management system Chef as a successor or complement to Cfengine to implement features we are lacking up to now.Speaker: Mr Christopher Huhn (GSI Darmstadt)
-
26
-
15:30
Coffee Break
-
IT InfrastructureConvener: Dr Helge Meinhard (CERN)
-
27
Hardware failures at CERNA detailed study of approximately 4000 vendor interventions for hardware failures experienced in CERN IT computing facility in 2010-2011 will be presented. The rates of parts replacements are compared for different components and as expected disk failures are dominating with approximately 1% quarterly replacement rate. When plotting the variation with age a higher rate is seen in the first year after deployment whereas there is no significant sign of wear-out at the end of the 3 years warranty.Speaker: Wayne Salter (CERN)
-
28
TSM Monitoring at CERNThe TSM server network at CERN - with its 17 TSM servers in production, 30 drives, ~1300 client nodes and ~4 PB of data - often needs an overwhelming amount of effort to be properly managed by the few TSM administrators. Hence, the need for a central monitoring system able to cope with the increasing number of servers, client nodes and volumes. We will discuss our approach to this issue, focusing on TSMMS, a TSM Monitoring System developed in-house, able to give an effective view of the needs and status of the network and of the individual servers, as well as statistics and usage reports. Avoiding repetitive error-prone manual checks, TSM admins are able to manage the whole TSM system just by looking at the periodic reports, and taking appropriate action. TSMMS scales seamlessly with the enlargement of the network, thus saving the cost of additional administrative personnel.Speaker: Dr Giuseppe Lo Presti (CERN)
-
27
-
-
-
Grid, Cloud & Virtualisation
-
29
CloudMan and VMIC projects overviewCERN is developing a different set of tools to improve and agile the Cloud Computing infrastructure. Currently there are two active important projects: CloudMan is a project developed in collaboration with BARC institute and the VMIC project is developed in collaboration with ASGC. The CloudMan project consists in the development of an Enterprise Graphical Management tool for IT resources and the VMIC project is a software tool that is used to manage in a trustful way the virtual images provided by different sites. This presentation will give an overview over this two projects and the current status, and will explain how they fit into the new Cloud Computing model at CERN.Speaker: Belmiro Daniel Rodrigues Moreira (CERN)
-
30
lxcloud infrastructure - status and lessons learnedIn December 2010 CERN moved parts of the batch resources into a cloud like infrastructure, and is running some of the batch resources in a fully virtualized infrastructure since then. This presentation will give an overview over the experiences learned from this exercise, the performance and results, impressions on operational overhead, and problems seen since the deployment of the infrastructure. On the development path, first experiences with SLC6 are shown. In addition, CERN has opened some resources to special users from ATLAS and LHCb via OpenNebula's EC2 interface, which they have been using to test CERNVM generated images which directly connect to the experiment frameworks to get their payload. First results from these tests are shown as well.Speaker: Belmiro Daniel Rodrigues Moreira (CERN)
- 31
- 32
-
29
-
10:20
Coffee Break
-
Grid, Cloud & VirtualisationConveners: Ian Gable (University of Victoria), Dr John Gordon (Particle Physics-Rutherford Appleton Laboratory-STFC - Science), Dr Keith Chadwick (Fermilab), Tony Cass (CERN)
- 33
-
34
"A year in the life of Eucalyptus"Funded by the American Recovery and Reinvestment Act (Recovery Act) through the U.S. Department of Energy (DOE), the Magellan project was charged with a task of evaluating if cloud computing could meet specialized needs of scientists. Split between two DOE centers: the National Energy Research Scientific Computing Center (NERSC) in California and the Argonne Leadership Computing Facility (ALCF) in Illinois Magellan built based on midrange hardware a testbed spanning both sites. One of many services offered within this project was Eucalyptus, an open-source implementation of Amazon’s popular EC2 cloud platform. Eucalyptus’ interfaces are designed to replicate the APIs used on EC2. This includes implementing many of the capabilities of EC2 including Elastic Block, S3, Elastic IPs, etc. We ran Eucalyptus services on the NERSC portion of Magellan for the past year and we are going to share high(and low)lights from both the admin and the user perspective.Speaker: Iwona Sakrejda
- 35
-
36
OpenStack: The OpenSource Cloud’s Application in High Energy PhysicsOpenStack’s mission is “To produce the ubiquitous Open Source cloud computing platform that will meet the needs of public and private cloud providers regardless of size, by being simple to implement and massively scalable." This talk will review the implications of this vision to meet the storage and compute needs of data intensive research projects, then examine OpenStack’s potential as a largely common hardware-agnostic platform for the federation of resources across sites and organizations.Speaker: Neil Johnston (Piston Cloud Computing)
-
Network & Security
-
37
Network connectivity for WLCG: the LHCONELHCONE (LHC Open Network Environment) is the network which will give dedicated bandwidth for LHC data transfer to Tier2s and Tier3sSpeaker: Edoardo Martelli (CERN)
-
38
perfSONAR or: How I Learned to Stop Worrying and Love Network Performance VerificationScientific innovation produced by Virtual Organizations (VOs) such as the LHC, demands high capacity and highly available network technologies to link remote data creation, storage, and processing facilities. Research and Education (R&E) networks are a vital cog in this supply chain, and offer advanced capabilities to this distributed scientific project. Network operations staff spend countless hours monitoring and assuring internal performance and traffic management needs, all to benefit local user communities. Often the "big picture" of end-to-end performance is forgotten, or cast aside, due to the relative complexity of multi-domain operational scenarios and the lack of human and technological resources. Software deigned to monitor and share network information between domains, developed by the perfSONAR-PS project, is available to help with end-to-end performance concerns. This framework, in use within the USATLAS project since 2007, and emerging on other collaborations including the Italian and Canadian ATLAS clouds, has been beneficial in identifying complex network faults while imposing minimal operational overhead on local administrators.Speaker: Jason Zurawski (Internet2)
-
39
Experience with IPv6 deployment in FZU ASCR in PragueWe are facing exhaustion of IPv4 addresses and transition to IPv6 is becoming more and more urgent. In this contribution we describe our current problems with IPv4 and our special motivation for transition to IPv6. We present our current IPv6 setup and installation of core network services like DNS and DHCPv6. We describe our PXE installation testbed and results of our experiments with installing operating system with PXE through IPv6. We have tested native PXE implementations of our current hardware as well as open source network bootloader gPXE. Actualisation of CRL of certification authorities is a service used by all components of gLite. We have prepared a web service for testing availability of revocation lists of certification authorities from lcg-CA bundle.Speaker: Mr Marek Elias (Institute of Physics AS CR, v. v. i. (FZU))
-
37
-
15:15
Coffee Break
-
Network & Security
-
40
IPv6 deployment at CERNDescription of the CERN IPV6 deployment project: service definition, features, implementation planSpeaker: Edoardo Martelli (CERN)
-
41
Report from the HEPiX IPv6 Working GroupThis new working group was formed earlier in 2011. There have been several meetings, sub-topics have been planned and work is now well underway. This talk will present the current status and plans for the future.Speaker: Dr David Kelsey (STFC - Science & Technology Facilities Council (GB))
-
40
-
HEPiX Board (closed)
-
-
-
Storage & File SystemsConveners: Andrei Maslennikov (CASPUR), Mr Peter van der Reest (Deutsches Elektronen-Synchrotron DESY)
-
42
Storage Status and Experiences at TRIUMFThe ATLAS Tier1 data centre at TRIUMF provides a highly efficient and scalable storage components to support LHC data analysis and production. This contribution will describe and review the storage infrastructure and configuration currently deployed at the Tier-1 data centre at TRIUMF for both disk and tape, as well sharing of past experiences. A brief outlook on test beds and future expansion will also be presented.Speaker: Simon Liu (TRIUMF (CA))
-
43
EMI, the second year.The European Middleware Initiative is now rapidly approaching its projects half-value period. Nearly all objectives of the first year of EMI-Data have been achieved and the feedback from the first EMI review has been very positive. Internet standards, like WebDAV and NFS4.1/pNFS have been integrated into the EMI set of storage elements, the already existing accounting record has been extended to cover storage and the synchronization of catalogues and storage elements has been designed and implemented within gLite. Furthermore, the close collaboration between EMI and EGI resulted in a very positive feedback from EGI and the subsequent creation of a set of new objectives focusing on the EGI acceptance of the EMI software distribution. This presentation will briefly describe the achieved goals of the first year but will primarily focus on the work EMI Data is facing for the rest of its projects lifetime, including the design of the common EMI data client libraries, the new gLite File Transfer Service (FTS3) and our efforts in consolidating http(s) and WebDAV.Speaker: Dr Patrick Fuhrmann (DESY)
- 44
-
42
-
10:30
Coffee Break
-
Storage & File SystemsConveners: Andrei Maslennikov (CASPUR), Mr Peter van der Reest (Deutsches Elektronen-Synchrotron DESY)
-
45
A highly distributed, petascale migration from dCache to HDFSThe University of Wisconsin CMS Tier-2 center serves nearly a petabyte of storage and tens of thousands of hours of computation each day to the global CMS community. After seven years, the storage cluster had grown to 250 commodity servers running both the dCache distributed filesystem and the Condor batch scheduler. This multipurpose, commodity approach had quickly and efficiently scaled to meet growing analysis and production demands. By 2010, when alternatives to dCache became available in the CMS community, the center was ready to test alternatives that might be a better fit for its hybrid model. HDFS had become widely accepted in the web world and was designed to run in a similarly mixed storage and execution environment. In early evaluations, it performed as well as dCache while also reducing the operational burden. So, in the spring of 2011, the center successfully migrated all of its production data to HDFS with only a few hours downtime. This migration was one of the largest to date within the CMS community. A unique and highly distributed mechanism was developed to complete the migration while maximizing availability of data to the thousands of jobs that run at Wisconsin each day. This talk presents the migration technique and evaluates its strengths, weaknesses and wider applicability as peers within the CMS community embark on their own migrations.Speaker: Mr William Maier (University of Wisconsin (US))
-
46
CASTOR and EOS status and plans[Still to be confirmed] The Data and Storage Services (DSS) group at CERN develops and operates two storage solutions for the CERN Physics data, targeting both Tier0 central data recording and preservation, and user-space physics analysis. In this talk we present the current status of the two systems, CASTOR and EOS, and the foreseen evolution in the medium term.Speaker: Dr Giuseppe Lo Presti (CERN)
-
47
CVMFS Production Status UpdateThe CernVM-FS has matured very quickly into a production quality tool for distributing VO software to grid sites. CVMFS is now in production use at a number of sites. This talk will recap the technology behind CVMFS and discuss the production status of the infrastructure.Speaker: Ian Collier (UK Tier1 Centre)
-
45
-
ComputingConvener: Dr Michele Michelotto (Univ. + INFN)
-
48
TACC 10 PFLOP SystemTo be definedSpeaker: Mr Roger Goff (DELL)
-
48
-
Network & Security
-
49
Computer Security updateThis presentation provides an update of the security landscape since the last meeting. It describes the main vectors of compromises in the academic community and presents interesting recent attacks. It also covers security risks management in general, as well as the security aspects of the current hot topics in computing, for example identity federation and virtualisation.Speaker: Mr Romain Wartel (CERN)
-
50
IPv6 - If Networking is Ready, Is Cyber Security Ready?The coming of IPv6 represents the introduction of a new protocol stack, rich in features and, if the past is any guide, an interesting set of challenges for cyber security. The talk will cover both current recommendations for IPv6 configuration and open issues requiring further discussion and investigation.Speaker: Bob Cowles (SLAC)
-
49
-
19:00
Conference Dinner (20th Anniversary)
-
-
-
HEPiX Past and Future
- 51
-
52
Networking RetrospectiveAt the inauguration of HEPiX in 1991, mainframes (and HEPVM) were on their way out with their bus & tag cables, channels with 3270 emulators and channel attached Ethernets. DEC/VMS and DECnet were still a major player in the scientific world. Mainframes and to a lesser extent VMS hosts were being replaced by Unix hosts with native TCP stacks running on thin and thicknet shared media, the phone system was still a separate entity, wireless networking was very much a niche. The wide area network consisted of a multitude of networks and protocols such as DECnet, SNA, XNS, Color books, Bitnet/EARN and the emerging (soon to die off) OSI. All these were soon to "pass like tears in rain"* and be displaced by TCP/IP and what we know as the Internet today. We will look back at the evolution of the local area network, home networking and the wide area network over the last 30-40 years in particular noting the state and changes since HEPiX was formed 20 years ago. * Blade RunnerSpeaker: Mr Les Cottrell (SLAC)
-
53
Hepi-X-PerienceThis is a personal retrospective view on 18 years of membership in the HEPiX community. Starting in 1993, it was associated with my career as a computer system engineer, the progression of high performance computing, and shifts of paradigm. The talk gives some spot lights on my own and community aspects during this time by recalling personal projects and events.Speaker: Thomas Finnern (DESY)
-
10:30
Coffee Break
-
HEPiX Past and FutureConvener: Mr Alan Silverman (CERN)
-
54
20 years of AFS serviceAlmost 20 years ago, the AFS service was born at CERN alongside a paradigm shift away from mainframe computing towards clusters. The scalable and manageable networked file system offered easy, ubiquitous access to files and greatly contributed to making this shift a success. Take a look back, with a smile rather than raised eyebrows, at how pre-Linux, pre-iPad, MegaByte and Megahertz technology faced the headwind of technological evolution and adapted over decades: AFS did a good job and continues to defeat, not always best in class but usually flexible.Speaker: Rainer Toebbicke (CERN)
-
55
An overview of computing hardware changes from 1991 to 2011An overview of computing hardware changes from 1991 to 2011 is given from a TRIUMF perspective. Aspects discussed are Moore’s law from speed, power consumption, and cost perspectives as well as how networks and commoditization, have influenced hardware. Speculation into the near and distant future nature of computing hardware is provided.Speaker: Mr Corrie Kost (TRIUMF)
-
54
-