Conveners
Track 8 Session: #1 (GPU and other accelerators)
- Niko Neufeld (CERN)
Track 8 Session: #2 (Vectorization, NUMA and distribution)
- Danilo Piparo (CERN)
Track 8 Session: #3 (Use of HPC for HEP)
- Niko Neufeld (CERN)
Description
Performance increase and optimization exploiting hardware features
Dr
Junichi Kanzaki
(KEK)
13/04/2015, 16:30
Track8: Performance increase and optimization exploiting hardware features
oral presentation
Fast event generation system of physics processes is developed using graphics processing unit (GPU).
The system is based on the Monte Carlo integration and event generation programs, BASES/SPRING, which were originally developed in FORTRAN.
They were rewritten on the CUDA platform provided by NVIDIA in order for the implementation of these programs to GPUs.
Since the Monte Carlo integration...
Dr
Sami Kama
(Southern Methodist University (US))
13/04/2015, 16:45
Track8: Performance increase and optimization exploiting hardware features
oral presentation
The growing size and complexity of events produced at the high luminosities expected in 2015 at the Large Hadron Collider demands much more computing power for the online event selection and for the offline data reconstruction than in the previous data taking period. In recent years, the explosive performance growth of low-cost, massively parallel processors like Graphical Processing Units...
Richard Calland
13/04/2015, 17:00
Track8: Performance increase and optimization exploiting hardware features
oral presentation
The Tokai-to-Kamioka (T2K) experiment is a second generation long baseline neutrino experiment, which uses a near detector to constrain systematic uncertainties for oscillation measurements with its far detector. Event-by-event reweighting of Monte Carlo (MC) events is applied to model systematic effects and construct PDFs describing predicted event distributions. However when analysing...
David Michael Rohr
(Johann-Wolfgang-Goethe Univ. (DE))
13/04/2015, 17:15
Track8: Performance increase and optimization exploiting hardware features
oral presentation
ALICE (A Large Heavy Ion Experiment) is one of the four major experiments at the Large Hadron Collider (LHC) at CERN, which is today the most powerful particle accelerator worldwide. The High Level Trigger (HLT) is an online compute farm of about 200 nodes, which reconstructs events measured by the ALICE detector in real-time. The HLT uses a custom online data-transport framework to distribute...
Philippe Canal
(Fermi National Accelerator Lab. (US))
13/04/2015, 17:30
Track8: Performance increase and optimization exploiting hardware features
oral presentation
The recent prevalence of hardware architectures of many-core or accelerated
processors opens opportunities for concurrent programming models taking
advantages of both SIMD and SIMT architectures. The Geant Vector Prototype
has been designed both to exploit the vector capability of main stream
CPUs and to take advantage of Coprocessors including NVidiaโs GPU and Intel
Xeon Phi. The...
Michele Martinelli
(INFN Rome)
13/04/2015, 17:45
Track8: Performance increase and optimization exploiting hardware features
oral presentation
The computing nodes of modern hybrid HPC systems are built using the CPU+GPU paradigm.
When this class of systems is scaled to large size, the efficiency of the network connecting GPUs mesh and supporting the internode traffic is a critical factor. The adoption of a low latency, high performance dedicated network architecture, exploiting peculiar characteristics of CPU and GPU hardware,...
Mr
Steffen Baehr
(Karlsruhe Institute of Technology)
13/04/2015, 18:00
Track8: Performance increase and optimization exploiting hardware features
oral presentation
The impending Upgrade of the Belle experiment is expected to increase the generated data set by a factor of 50.
This means that for the planned pixeldetector, which is the closest to the interaction point, the data rates are going to increase to over 20 GB/s.
Combined with data generated by the other detectors, this rate is too big to be efficiently send out to offline processing.
This is...
Srikanth Sridharan
(CERN)
13/04/2015, 18:15
Track8: Performance increase and optimization exploiting hardware features
oral presentation
The proposed upgrade for the Large Hadron Collider LHCb experiment at CERN envisages a system of 500 Data sources each generating data at 100 Gbps, the acquisition and processing of which is a challenge even for state of the art FPGAs. This challenge splits into two, the Data Acquisition (DAQ) part and the Algorithm acceleration part, the later not necessarily immediately following the former....
Rainer Schwemmer
(CERN)
14/04/2015, 16:30
Track8: Performance increase and optimization exploiting hardware features
oral presentation
For Run 2 of the LHC, LHCb is exchanging a significant part of its event filter farm with new compute nodes. For the evaluation of the best performing solution, we have developed a method to convert our high level trigger application into a stand-alone, bootable benchmark image. With additional instrumentation we turned it into a self-optimising benchmark which explores techniques such as late...
Daniel Hugo Campora Perez
(CERN)
14/04/2015, 16:45
Track8: Performance increase and optimization exploiting hardware features
oral presentation
During the data taking process in the LHC at CERN, millions of collisions are recorded every second by the LHCb Detector. The LHCb "Online" computing farm, counting around 15000 cores, is dedicated to the recontruction of the events in real-time, in order to filter those with interesting Physics. The ones kept are later analysed "Offline" in a more precise fashion on the Grid. This imposes...
Ms
Bowen Kan
(Institute of High Physics Chinese Academy of Sciences)
14/04/2015, 17:00
Track8: Performance increase and optimization exploiting hardware features
oral presentation
Scheduler is one of the most important components of high performance cluster. This paper introduces a self-adaptive dispatching system (SAPS) based on torque/maui which increases the resources utilization of cluster effectively and guarantees the high reliability of the computing platform. It provides great convenience for users to run various tasks on the computing platform. First of all,...
Mr
Giulio Eulisse
(Fermi National Accelerator Lab. (US))
14/04/2015, 17:15
Track8: Performance increase and optimization exploiting hardware features
oral presentation
Power consumption will be a key constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics (HEP). This makes performance-per-watt a crucial metric for selecting cost-efficient computing solutions. For this paper, we have done a wide survey of current and emerging architectures becoming available on the market including x86-64 variants, ARMv7...
Jakob Blomer
(CERN)
14/04/2015, 17:30
Track8: Performance increase and optimization exploiting hardware features
oral presentation
Most high-energy physics analysis jobs are embarrassingly parallel except for the final merging of the output objects, which are typically histograms. Currently, the merging of output histograms scales badly. The running time for distributed merging depends not only on the overall number of bins but
also on the number partial histogram output files. That means, while the time to analyze data...
Max Fischer
(KIT - Karlsruhe Institute of Technology (DE))
14/04/2015, 17:45
Track8: Performance increase and optimization exploiting hardware features
oral presentation
With the second run period of the LHC, high energy physics collaborations will have to face increasing computing infrastructural needs.
Opportunistic resources are expected to absorb many computationally expensive tasks, such as Monte Carlo event simulation.
This leaves dedicated HEP infrastructure with an increased load of analysis tasks that in turn will need to process an increased volume...
Mr
Pawel Szostek
(CERN)
14/04/2015, 18:00
Track8: Performance increase and optimization exploiting hardware features
oral presentation
As Moore's Law drives the silicon industry towards higher transistor counts, processor designs are becoming more and more complex. The area of development includes core count, execution ports, vector units, uncore architecture and finally instruction sets. This increasing complexity leads us to a place where access to the shared memory is the major limiting factor, making feeding the cores...
Christopher Hollowell
(Brookhaven National Laboratory)
14/04/2015, 18:15
Track8: Performance increase and optimization exploiting hardware features
oral presentation
Non-uniform memory access (NUMA) is a memory architecture for symmetric
multiprocessing (SMP) systems where each processor is directly connected
to separate memory. Indirect access to other CPU's (remote) RAM is still
possible, but such requests are slower as they must also pass through that
memory's controlling CPU. In concert with a NUMA-aware operating
system, the NUMA hardware...
Vakho Tsulaia
(Lawrence Berkeley National Lab. (US))
16/04/2015, 11:00
Track8: Performance increase and optimization exploiting hardware features
oral presentation
High performance computing facilities present unique challenges and opportunities for HENP event processing. The massive scale of many HPC systems means that fractionally small utilizations can yield large returns in processing throughput. Parallel applications which can dynamically and efficiently fill any scheduling opportunities the resource presents benefit both the facility (maximal...
Sergey Panitkin
(Brookhaven National Laboratory (US))
16/04/2015, 11:15
Track8: Performance increase and optimization exploiting hardware features
oral presentation
The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment.ย
ย While PanDA currently uses more than 100,000 cores at well over 100 Grid sites with a peak performance of 0.3 petaFLOPS, next LHC data taking run will require more resources than Grid computing can possibly...
Dr
David Chamont
(LLR - รcole polytechnique)
16/04/2015, 11:30
Track8: Performance increase and optimization exploiting hardware features
oral presentation
The Matrix Element Method (MEM) is a well known powerful approach in particle physics to extract maximal information of the events arising from the LHC pp collisions. Compared to other methods requiring trainings, the MEM allows direct comparisons between a theory and the observation. Since the phase space has a higher dimensionality to explore, MEM is much more CPU time consuming at the...
Taylor Childers
(Argonne National Laboratory (US))
16/04/2015, 11:45
Track8: Performance increase and optimization exploiting hardware features
oral presentation
Demand for Grid resources is expected to double during LHC Run II as compared to Run I; the capacity of the grid, however, will not double. The HEP community must consider how to bridge this computing gap. Two approaches to meeting this demand include targeting larger compute resources, and using the available compute resources as efficiently as possible. Argonneโs Mira, the fifth fastest...