Johan Henrik Guldmyr
(Helsinki Institute of Physics (FI))
3/23/15, 9:15 AM
Site reports
- CSC General HPC updates
- New Haswell Hardware for supercluster and supercomputer
- Slurm/Lustre
- taito-shell.csc.fi - a slurm/sshd/iptables-based interactive shell load balancer. Replaced large memory (1TB) interactive nodes.
- DDN SFA12k
- Using ELK stack (Elasticsearch Logstash Kibana)
- sort through dCache logs
- Search through auditd logs - anybody used shred? root...
Jingyan Shi
(IHEP)
3/23/15, 9:45 AM
Site reports
The status of IHEP site and the improvement we've got and what is in our plan this year.
Martin Adam
(UJF Rez),
Václav Říkal
3/23/15, 10:00 AM
Site reports
We will give an overview of the site and will share experience with these topics:
migration of virtualized servers to a new infrastructure, migration from cfengine
to puppet and spacewalk as new systems management solution, procurement
of a new hardware (worker nodes and storage servers).
William Strecker-Kellogg
(Brookhaven National Lab)
3/23/15, 11:10 AM
Site reports
Brookhaven National Lab (BNL) will present the site report for the RHIC-ATLAS Computing Facility (RACF), covering developments over the past 6 months.
James Botts
(LBNL)
3/23/15, 11:25 AM
Site reports
PDSF, the Parallel Distributed Systems Facility, has been serving high energy physics and been in continuous operation at NERSC since 1996. It is currently a tier-1 site for Star, tier-2 for Alice and tier-3 for Atlas. This site report will describe recent updates to the system and upcoming modifications. PDSF will move this year from its current site to a new building on the LBNL campus and...
Garhan Attebury
(University of Nebraska (US))
3/23/15, 11:40 AM
Site reports
Site report covering the status of T2_US_Nebraska and changes / updates since the Fall 2014 meeting.
Sandy Philpott
(JLAB)
3/23/15, 11:55 AM
Site reports
Current high performance and experimental physics computing environment updates: core exchanges between USQCD and Experimental Physics clusters for load balancing, job efficiency, and 12GeV data challenges; Nvidia K80 GPU experiences and updated Intel MIC environment; update on locally developed workflow tools and write-through to tape cache filesystem; status of LTO6 integration into our MSS;...
Mr
Julien Carpentier
(CCIN2P3)
3/23/15, 12:25 PM
Site reports
We will present the lastest status of the IN2P3 Computer Center. Emphasis will be made to the infrastructure and system area.
Lisa Gerhardt
(LBNL), Mr
Yushu Yao
(LBNL)
3/23/15, 2:00 PM
End-User IT Services & Operating Systems
SciDB is an open-source analytical database for scalable complex analytics on very large array or multi-structured data from a variety of sources, programmable from Python and R. It runs on HPC, commodity hardware grids, or in a cloud and can manage and analyze terabytes of array-structured data and do complex analytics in-database.
We present an overall description of the SciDB framework and...
Mr
Michel Jouvin
(Laboratoire de l'Accelerateur Lineaire (FR))
3/23/15, 2:25 PM
End-User IT Services & Operating Systems
The HEP Software Foundation (HSF) is a one year old inititative to foster collabarotion in software development in the HEP community and related scientific communities. Launched by a kick-off meeting at CERN in April 2014, the first year has been spend to better define what HSF should be. An HSF workshop was held in January at SLAC and HSF is now entering is "implementation phase". This talk...
Mr
Andreas Wagner
(CERN)
3/23/15, 2:50 PM
End-User IT Services & Operating Systems
- Status of CERN Web Services
- Overview
- Web Site Life Cycle Management
- Web Analytics
- CERN’s Enterprise Social Networking System
- Motivation & purpose
- Feature overview: microblogging, profiles, social networking, suggestion systems and discussion forums
- CERN Search...
Thomas Baron
(CERN)
3/23/15, 3:15 PM
End-User IT Services & Operating Systems
A lot of visible and behind-the-scene actions have been taken in recent months to prepare CERN conferencing services (Indico, Vidyo, the webcast and conference rooms services) for challenges to come. These services will be described in terms of features and usage statistics. We will present their integration to the CERN layered cloud infrastructure, and with other IT base services. We will...
Connie Sieh
(FNAL)
3/23/15, 4:05 PM
End-User IT Services & Operating Systems
Current Status of Scientific Linux
Dr
Arne Wiebalck
(CERN)
3/23/15, 4:30 PM
End-User IT Services & Operating Systems
In this talk we will present a brief status update on CERN's work on CentOS 7, the uptake by the various IT services, and the interaction with the upstream CentOS community.
Mr
Emyr James
(Wellcome Trust, Sanger Institute)
3/23/15, 4:55 PM
End-User IT Services & Operating Systems
The Wellcome Trust Sanger Institute is a charitably funded genomic research centre. A leader in the Human Genome Project, it is now focused on understanding the role of genetics in health and disease. Large amounts of data is produced at the institute by next-generation sequencing machines. The data is then stored, processed and analysed on the institute's computing cluster.
The main compute...
Wayne Salter
(CERN)
3/23/15, 5:20 PM
IT Facilities & Business Continuity
Many of you are aware of the power incident we had on the 16th October during the last HEPiX workshop. I will give a detailed explanation of what happened, the impact on IT services as well as the actions taken to recover from the incident. I will also note some improvements that will be implemented as a result of this incident. I will then go on to discuss other operations incidents that we...
Jose Flix Molina
(Centro de Investigaciones Energ. Medioambientales y Tecn. - (ES)
3/24/15, 9:00 AM
Site reports
We will be revising the status of PIC Tier-1 by Spring 2015. The typical site report which is reported in HEPIX.
Dr
Sean Brisbane
(University of Oxford)
3/24/15, 9:15 AM
Site reports
A site report from the University of Oxford focusing on the integration challenges between the various systems.
Tina Friedrich
(Diamond Light Source Ltd)
3/24/15, 9:45 AM
Site reports
Diamond Light Source site report
Sang Un Ahn
(KiSTi Korea Institute of Science & Technology Information (KR))
3/24/15, 10:00 AM
Site reports
The status of KISTI-GSDC Tier-1 site will be present including a brief of history of the KISTI-GSDC Site, system summary (configuration management), PBS batch issues, Tier-1 operations and future plan.
Nils Hoimyr
(CERN)
3/24/15, 10:55 AM
End-User IT Services & Operating Systems
An update will be given on the status of collaborative tools for software developers, Version Control Services (Git and SVN), Issue Tracking (JIRA), Integration (Jenkins) and documentation (TWiki)
The presentation will focus on collaborative ascpects for software developers and report on progress since the fall meeting.
Mr
Dirk Jahnke-Zumbusch
(DESY)
3/24/15, 11:15 AM
End-User IT Services & Operating Systems
After more than ten years of operations the game is over
for Exchange 2003 at DESY. Now Zimbra has been set into
production and data from both Exchange 2003 and the UNIX
mail service is being migrated and consolidated gradually.
The architecture of the Zimbra mail service, the migration
procedures and some experiences will be presented. Finally
we will look at some integration aspects of...
Nils Hoimyr
(CERN)
3/24/15, 11:40 AM
End-User IT Services & Operating Systems
Status of LHC@home, volunteer computing at CERN and for the LHC experiments. The presenter will give an update on the volunteer computing strategy for HEP and different scenarii for use of volunteer cloud computing or other lightweight cloud infrastructes to run experiment code under CernVM on available computing resources. Furthermore, the current status of the CERN BOINC server...
Rennie S. Scott
(FNAL),
connie sieh
(Fermilab)
3/24/15, 12:05 PM
Site reports
Site report from Fermilab
Adam Lukasz Krajewski
(Warsaw University of Technology (PL))
3/24/15, 12:20 PM
Security & Networking
Following an incident with a slow database replication between CERN's
data centers, we discovered that even a very low rate packet loss in the
network (order of 0.001%) can induce significant penalties to long
distance single stream TCP transfers. We explore the behaviour of
multiple TCP congestion control algorithms in a controlled loss and
delay environment in order to understand...
Mr
Romain Wartel
(CERN)
3/24/15, 2:00 PM
Security & Networking
This presentation gives an overview of the current computer security landscape. It describes the main vectors of compromises in the academic community including lessons learnt, and reveal inner mechanisms of the underground economy to expose how our resources are exploited by organised crime groups, as well as recommendations to protect ourselves. By showing how these attacks are both...
Ian Peter Collier
(STFC - Rutherford Appleton Lab. (GB))
3/24/15, 2:25 PM
Security & Networking
Report on the initial activities of the WLCG Cloud Traceability Working Group
Linda Ann Cornwall
(STFC - Rutherford Appleton Lab. (GB))
3/24/15, 2:50 PM
Security & Networking
The European Grid Infrastructure (EGI) and Worldwide Large Hadron collider Grid (WLCG) infrastructure largely overlap and share the majority of security activities. A lot of security related activity goes on behind the scenes concerning such a large scale distributed computing infrastructure. Security incident prevention takes up the larger amount of effort, and this is carried out via...
David Crooks
(University of Glasgow (GB))
3/24/15, 3:15 PM
Security & Networking
OSSEC, the popular HIDS (Host Intrusion Detection System), has been widely used for a number of years. More recently, tools like Elasticsearch, Logstash and Kibana (ELK) have become popular in visualising and working with data such as that aggregated by OSSEC. We report on a recent implementation of OSSEC, coupled to an ELK instance, at the Glasgow
site of the UKI-SCOTGRID distributed Tier-2....
Marian Babik
(CERN)
3/24/15, 3:50 PM
Security & Networking
WLCG relies on the network as a critical part of its infrastructure and therefore needs to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion and traffic routing. The WLCG Network and Transfer Metrics working group was established to ensure sites and experiments can better understand and fix networking issues....
Dave Kelsey
(STFC - Rutherford Appleton Lab. (GB))
3/24/15, 4:15 PM
Security & Networking
This talk will present an update from the HEPiX IPv6 Working Group. This will include details of recent testing activities and plans for the deployment of dual-stack data services and monitoring on (at least some of) the WLCG infrastructure.
Ulf Bobson Severin Tigerstedt
(Helsinki Institute of Physics (FI))
3/24/15, 4:40 PM
Security & Networking
A view back on testing IPv6 and different versions of dCache as it has evolved from 2.6 to 2.12 and barely-working to well working.
Francesco Prelz
(Università degli Studi e INFN Milano (IT))
3/24/15, 4:55 PM
Security & Networking
Probably the most prominent change that IPv6 introduces in the semantics of internet protocol applications is the need to *always* deal with multiple addresses (possibly both IPv4 and IPv6) associated to each network endpoint. A quick overview of how and where addresses are categorised, ordered and preferred is presented, both from the system administrator and the developer viewpoint. A few...
Kacper Surdy
(CERN)
3/24/15, 5:20 PM
Storage & Filesystems
There are terabytes of data stored in a relational database (Oracle) at CERN which in fact does not need a relational model. Moreover, using a relational database management system very often brings a significant overhead in terms of resource utilization. The problem is notably observable for warehouse-type data sets. At the same time running analytical workloads on such data sets requires...
Dave Kelsey
(STFC - Rutherford Appleton Lab. (GB)), Dr
Shawn McKee
(University of Michigan ATLAS Group)
3/24/15, 6:00 PM
Julien Leduc
(CERN)
3/25/15, 9:00 AM
IT Facilities & Business Continuity
CERN Computer Center (CC) is a large building that integrates several kilometers of fibers, copper cables, pipes and several complex installations (UPSes, water cooling, heat exchangers...).
This evolving building is a large theater with numerous actors:
- contractors, performing construction work, building maintenance or hardware replacement
- engineers and technicians, debugging...
Yves Kemp
(Deutsches Elektronen-Synchrotron (DE))
3/25/15, 9:25 AM
Storage & Filesystems
Recent advances in both, hard-disks and system-on-a-chip (SoC) designs enabled the development of a novel form of hard-disk: a disk that includes a network interface and an additional ARM processor, not involved in low level disk operations. This setup allows those disks to run an operating system and to communicate with other nodes autonomously using wired Ethernet. No additional hardware or...
Stefan Dietrich
(DESY)
3/25/15, 9:50 AM
Storage & Filesystems
PETRA III is DESY's largest ring accelerator and the most brilliant storage-ring-based X-ray radiation source in the world.
With its recent extension, new and faster detectors are used for the data acquisition.
They exceed previous detectors in terms of data rate and volume; this is highly demanding for the underlying storage system.
This talk will present the challenges we faced, the new...
Yves Kemp
(Deutsches Elektronen-Synchrotron (DE))
3/25/15, 10:15 AM
Storage & Filesystems
The presentation will present:
- History and current status of the BeeGFS project (formerly know as FhGFS, originating from Fraunhofer)
- Design and technology decisions made by BeeGFS developers
- BeeGFS setup and operational experience as IniniBand based high-performance cluster file system serving as scratch space for the DESY HPC system
- Discussion of future usage scenarios and...
Alastair Dewhurst
(STFC - Rutherford Appleton Lab. (GB))
3/25/15, 11:05 AM
Storage & Filesystems
RAL is currently exploring the possibilities offered by Ceph. This talk will describe two of these projects. The first project aims to provide large scale, high throughput storage for experimental data. This will initially be used by the WLCG VOs. A prototype cluster built from old hardware has been in testing since October 2014. The WLCG VOs will continue to need to access their data via...
Herve Rousseau
(CERN)
3/25/15, 11:30 AM
Storage & Filesystems
Ceph has become over time a key component of CERN’s Agile Infrastructure by providing storage for the Openstack service.
In this talk, we will briefly introduce Ceph’s concepts, our current cluster and the services we provide such as NFS filers, Object Store for the Atlas experiment and Xroot-to-Ceph gateways.
We will then talk about our experience running Ceph with some real-world...
Dr
Ofer Rind
(BROOKHAVEN NATIONAL LABORATORY)
3/25/15, 11:55 AM
Storage & Filesystems
We review various functionality, performance, and stability tests performed at the RHIC and ATLAS Computing Facility (RACF) at Brookhaven National Laboratory (BNL) in 2014-2015. Tests were run on all three (object storage, block storage and file system) levels of Ceph, using a range of hardware platforms and networking solutions, including 10/40 Gbps Ethernet and IPoIB/4X FDR Infiniband. We...
Mr
Spray John
(Red Hat, Inc.)
3/25/15, 12:20 PM
Storage & Filesystems
The Ceph storage system is an open source, highly scalable, resilient data storage service providing object, block and file interfaces. This presentation will introduce what is new in the latest Ceph release, codenamed *Hammer*, and describe the ongoing development activities around CephFS, the Ceph filesystem.
An intermediate level of familiarity with large scale storage systems will be assumed.
Dr
Arne Wiebalck
(CERN)
3/25/15, 1:00 PM
Peter Love
(Lancaster University (GB))
3/25/15, 2:00 PM
Computing & Batch Services
This contribution describes the usage and benchmarking of a commercial data centre running Openstack. Different cloud provisional tools are described highlighting the pros and cons of each system. A comparison is made between this facility and a standard grid T2 site in terms of job throughput and availability. Usage of the centre’s local object store is also described.
Dr
Tony Wong
(Brookhaven National Laboratory)
3/25/15, 2:25 PM
Computing & Batch Services
The RHIC-ATLAS Computing Facilty (RACF) at BNL has traditionally evaluated hardware on-site, with physical access to the systems. The effort to request evaluation hardware, shipping, set-up and testing has consumed an increasing amount of time and the process has become less productive over the years. To regain past productivity and shorten the evaluation process, BNL has started a pilot...
Gang Qin
(University of Glasgow (GB))
3/25/15, 2:50 PM
Computing & Batch Services
Modern Linux Kernels include a feature set that enables the
control and monitoring of system resources, called Cgroups. Cgroups
have been enabled on a production HTCondor pool sited at the Glasgow
site of the UKI-SCOTGRID distributed Tier-2. A system has been put in
place to collect and aggregate metrics extracted from Cgroups on all
worker nodes within the Condor pool. From this...
Manfred Alef
(Karlsruhe Institute of Technology (KIT))
3/25/15, 3:15 PM
Computing & Batch Services
In this talk we will provide information about the current status of the preliminary work to relaunch the HEPiX Benchmarking Working Group which will develop the next release of the HEP CPU benchmark.
Dr
Michele Michelotto
(INFN Padua & CMS)
3/25/15, 4:05 PM
Computing & Batch Services
The WLCG community has requested a fast benchmark to quickly assess the perfomances of a worker node. A good candidate is a python script used in LHCb
Dr
Lucia Morganti
(INFN)
3/25/15, 4:30 PM
Computing & Batch Services
Systems on Chip (SoCs), originally targeted for mobile and embedded technology, are becoming attractive for HEP and HPC scientific communities, given their low cost, huge worldwide shipments, low power consumption and increasing processing power - mostly associated with their GPUs.
A variety of development boards are currently available, making it foreseeable to use these power-efficient...
Liviu Valsan
(CERN)
3/25/15, 4:55 PM
Computing & Batch Services
x86 is the uncontested leader for server platforms in terms of market share and is currently the architecture of choice for High Energy Physics applications. But as more and more importance is given to power efficiency, physical density and total cost of ownership we are seeing new processor architectures emerging and some existing ones becoming more open. With the introduction of AArch64,...
Mr
David Power
(Boston Ltd.)
3/25/15, 5:20 PM
Computing & Batch Services
The talk's coverage will include Xeon Haswell, ARM and Open Compute Platforms
William Strecker-Kellogg
(Brookhaven National Lab)
3/26/15, 9:00 AM
Basic IT Services
It's simple enough to instantiate a new process in an existing
environment; it can be much more challenging to foster acceptance of
such a process in IT environments and cultures that are traditionally
stagnant and resistant to change, and to maintain and optimize that
process to ensure it continues to realize optimal benefit. To enhance
our computing facility, we've already taken...
Alberto Rodriguez Peon
(Universidad de Oviedo (ES))
3/26/15, 9:25 AM
Basic IT Services
CERN’s experience of migrating a large site to a Puppet-based and more dynamic Configuration Service will be presented. The presentation will review some of the challenges encountered along the way and describe future plans for how to scale the service and improve the overall automation of operations on the site.
Stefan Dietrich
(DESY)
3/26/15, 9:50 AM
Basic IT Services
Marionette Collective, also known as MCollective, is a framework for building server orchestration, monitoring, and parallel job execution.
MCollective uses a modern "Publish Subscribe Middleware" for a scalable and fast execution environment.
It is a powerful tool in combination with Puppet, due to the good integration.
However it can be a challenging task to configure and deploy...
James Adams
(STFC RAL)
3/26/15, 10:15 AM
Basic IT Services
The Quattor community has been maintaining Quattor for over ten years and having recently held our 19th community workshop the pace of development continues to increase.
This talk will demonstrate why Quattor is more than just a configuration management system, report on recent developments and provide some notable updates and experiences from sites.
Peter Love
(Lancaster University (GB))
3/26/15, 11:05 AM
Basic IT Services
The dominant monitoring system used in distributed computing consists of visually rich time-series graphs and notification systems for alerting operators when metrics fall outside of accepted values. For large systems this can quickly become overwhelming. In this contribution a different approach is described using the sonification of monitoring messages with an architecture which fits easily...
Francisco Valentin Vinagrero
(CERN)
3/26/15, 11:30 AM
Basic IT Services
IP-based voice telephony (VoIP) and the SIP protocol are clear examples of disruptive technologies that have revolutionised a previously settled market. In particular, open-source solutions now have the ascendancy in the traditional Private Branch eXchange(PBX) market.
We present a possible architecture for the modernisation of CERN's fixed telephony network, highlighting the technical...
Andrei Dumitru
(CERN)
3/26/15, 11:55 AM
Basic IT Services
CERN has a great number of applications that rely on a database for their daily operations. From physics related databases to the administrative, sector there is a high demand to have a database system appropriate to the users' needs and requirements. This presentation gives a summary of the current state of the Database Services at CERN, the work done during LS1 and some insights into the...
Daniel Gruber
(U)
3/26/15, 12:20 PM
Computing & Batch Services
- Introduction - DRMAA2 in a Nutshell
- The C Interface - Data Types, Monitoring Sessions, Job Sessions,
Working with Jobs, Job Templates, Error Handling and Dealing with
Enhancements
- Getting started with DRMAA2
- Example Applications - Job Monitoring Applications and Simple
Multi-Clustering
George Ryall
(STFC - Rutherford Appleton Lab.)
3/26/15, 2:00 PM
Grid, Cloud & Virtualisation
The STFC Scientific computing department has been developing an OpenNebula based cloud underpinned by Ceph block storage. I will describe some of our use cases, our set up,and give a demonstration of our development VM on demand service. I will go on to explore some of the problems we have overcome to reach this point. Finally, I will present the work we are doing to use spare capacity on...
Bruno Bompastor
(CERN)
3/26/15, 2:25 PM
Grid, Cloud & Virtualisation
This is a report on the current status and future plans of CERN’s OpenStack-based Cloud Infrastructure.
Alexander Dibbo
(urn:Google)
3/26/15, 2:50 PM
Grid, Cloud & Virtualisation
The Scientific Computing Department at the STFC has been developing a Ceph block storage backed OpenNebula cloud. We have carried out a quantitative evaluation of the performance characteristics of virtual machines which have been instantiated with a variety of different storage configurations (using both Ceph and local disks). I will describe our motivations for this testing, our methodology...
Andrew McNab
(University of Manchester (GB))
3/26/15, 3:15 PM
Grid, Cloud & Virtualisation
The Vacuum model provides a method for managing the lifecycle of virtual machines based on their observed success or failure in finding work to do for their experiment. In contrast to centrally managed grid job
submission and cloud VM instantiation systems, the Vacuum model gives resource providers direct control over which experiments' VMs or jobs are created and in what proportion. This...
John Hover
(Brookhaven National Laboratory (BNL)-Unknown-Unknown)
3/26/15, 4:05 PM
Grid, Cloud & Virtualisation
Beginning in September 2014, the RACF at Brookhaven National Lab has been collaborating with Amazon's scientific computing group in a pilot project. The goal of this project is to demonstrate the usage of Amazon AWS (EC2, S3, etc.) for real-world ATLAS production. This will prove the practical and economic feasibility of ATLAS beginning to leverage commercial cloud computing to optimize...
Mr
Dario Rivera
(Amazon Web Services)
3/26/15, 4:30 PM
Grid, Cloud & Virtualisation
On the heals of discussing the BNL RACF Group's Proof Of Concept on AWS, this session will share best practices on some of the most common AWS services used by Big Science, such as EC2, VPC, S3, and complex hybrid networking and routing. We will also provide an overview of the AWS Scientific Computing Group which was created to help Global Scientific collaborations develop and ecosystem...
Bruno Bompastor
(CERN)
3/26/15, 4:55 PM
Grid, Cloud & Virtualisation
Heat, the Openstack orchestration service, is being deployed at CERN. We will be presenting the overall architecture and features included in the project, our deployment challenges and future plans.
Mr
Levente Hajdu
(Brookhaven National Laboratory)
3/26/15, 5:20 PM
Grid, Cloud & Virtualisation
In statistically hungry science domains, data taking data deluges can be both a blessing and a curse. They allow the winnowing out of statistical errors from known measurements, open the door to new scientific opportunities as the physics program matures but are also a testament to the efficiency of the experiment and accelerator and skill of its operators. However, the data samples need to be...
Jerome Belleman
(CERN)
3/27/15, 9:00 AM
Computing & Batch Services
The CERN Batch System comprises 4000 worker nodes, 60 queues and offers
a service for various types of large user communities. In light of the
developments driven by the Agile Infrastructure and the more demanding
processing requirements, it is faced with increasingly challenging scalability
and flexibility needs.
This production cluster currently runs IBM/Platform LSF. Over the last...
Manfred Alef
(Karlsruhe Institute of Technology (KIT))
3/27/15, 9:25 AM
Computing & Batch Services
The Grid Computing Centre Karlsruhe (GridKa) is using the Grid Engine batch system since 2011. In this presentation I will talk about the experiences with this batch system, including multi-core job support, and first experiences with cgroups.
Erik Mattias Wadenstein
(University of Umeå (SE))
3/27/15, 9:50 AM
Computing & Batch Services
An update on the current status of SLURM usage in the Nordics, as well as recent developments in improving support for LHC type jobs including tuning for efficient scheduling of multicore grid jobs. Also an overview of some remaining challenges will be given together with discussion on how to address them.
Mr
Michel Jouvin
(Laboratoire de l'Accelerateur Lineaire (FR))
3/27/15, 10:15 AM
Computing & Batch Services
I propose to give a summary of the Condor workshop, held at CERN mid-December.
Andrew David Lahiff
(STFC - Rutherford Appleton Lab. (GB))
3/27/15, 11:05 AM
Computing & Batch Services
After running Torque/Maui for many years, the RAL Tier-1 migrated to HTCondor during 2013 in order to benefit from improved reliability, scalability and additional functionality unavailable in Torque. This talk will discuss the deployment of HTCondor at RAL, our experiences and the evolution of our pool over the past two years, as well as our future plans.
Jerome Belleman
(CERN)
3/27/15, 11:30 AM
Computing & Batch Services
While we are taking measures to face the limitations discussed earlier on
in our IBM/Platform LSF cluster, we have been working on setting up a new
batch system based on HTCondor. There has been some progress with the pilot
service which we described last HEPiX. We also went on investigating some
of the more advanced functions which will lead up to the production state
of the new CERN...
Andrew David Lahiff
(STFC - Rutherford Appleton Lab. (GB))
3/27/15, 11:55 AM
Computing & Batch Services
With the increasing interest in HTCondor in Europe, an important question for sites considering migrating to HTCondor is how well it integrates with the standard grid middleware, in particular integration with the information system and APEL accounting. Also, with the increasing interest and usage of private clouds, how easily a batch system can be integrated with a private cloud is another...
Stephen Jones
(Liverpool University)
3/27/15, 12:20 PM
Computing & Batch Services
This talk describes DrainBoss, which is a proportional integral (PI) controller with conditional logic that strives to maintain the correct ratio between single-core and multi-core jobs in an ARC/HTCondor cluster. DrainBoss can be used instead of the HTCondor DEFRAG Daemon.