Conveners
Track 8 Session: #1 (GPU and other accelerators)
- Niko Neufeld (CERN)
Track 8 Session: #2 (Vectorization, NUMA and distribution)
- Danilo Piparo (CERN)
Track 8 Session: #3 (Use of HPC for HEP)
- Niko Neufeld (CERN)
Description
Performance increase and optimization exploiting hardware features
-
Dr Junichi Kanzaki (KEK)13/04/2015, 16:30Track8: Performance increase and optimization exploiting hardware featuresoral presentationFast event generation system of physics processes is developed using graphics processing unit (GPU). The system is based on the Monte Carlo integration and event generation programs, BASES/SPRING, which were originally developed in FORTRAN. They were rewritten on the CUDA platform provided by NVIDIA in order for the implementation of these programs to GPUs. Since the Monte Carlo integration...Go to contribution page
-
Dr Sami Kama (Southern Methodist University (US))13/04/2015, 16:45Track8: Performance increase and optimization exploiting hardware featuresoral presentationThe growing size and complexity of events produced at the high luminosities expected in 2015 at the Large Hadron Collider demands much more computing power for the online event selection and for the offline data reconstruction than in the previous data taking period. In recent years, the explosive performance growth of low-cost, massively parallel processors like Graphical Processing Units...Go to contribution page
-
Richard Calland13/04/2015, 17:00Track8: Performance increase and optimization exploiting hardware featuresoral presentationThe Tokai-to-Kamioka (T2K) experiment is a second generation long baseline neutrino experiment, which uses a near detector to constrain systematic uncertainties for oscillation measurements with its far detector. Event-by-event reweighting of Monte Carlo (MC) events is applied to model systematic effects and construct PDFs describing predicted event distributions. However when analysing...Go to contribution page
-
David Michael Rohr (Johann-Wolfgang-Goethe Univ. (DE))13/04/2015, 17:15Track8: Performance increase and optimization exploiting hardware featuresoral presentationALICE (A Large Heavy Ion Experiment) is one of the four major experiments at the Large Hadron Collider (LHC) at CERN, which is today the most powerful particle accelerator worldwide. The High Level Trigger (HLT) is an online compute farm of about 200 nodes, which reconstructs events measured by the ALICE detector in real-time. The HLT uses a custom online data-transport framework to distribute...Go to contribution page
-
Philippe Canal (Fermi National Accelerator Lab. (US))13/04/2015, 17:30Track8: Performance increase and optimization exploiting hardware featuresoral presentationThe recent prevalence of hardware architectures of many-core or accelerated processors opens opportunities for concurrent programming models taking advantages of both SIMD and SIMT architectures. The Geant Vector Prototype has been designed both to exploit the vector capability of main stream CPUs and to take advantage of Coprocessors including NVidiaโs GPU and Intel Xeon Phi. The...Go to contribution page
-
Michele Martinelli (INFN Rome)13/04/2015, 17:45Track8: Performance increase and optimization exploiting hardware featuresoral presentationThe computing nodes of modern hybrid HPC systems are built using the CPU+GPU paradigm. When this class of systems is scaled to large size, the efficiency of the network connecting GPUs mesh and supporting the internode traffic is a critical factor. The adoption of a low latency, high performance dedicated network architecture, exploiting peculiar characteristics of CPU and GPU hardware,...Go to contribution page
-
Mr Steffen Baehr (Karlsruhe Institute of Technology)13/04/2015, 18:00Track8: Performance increase and optimization exploiting hardware featuresoral presentationThe impending Upgrade of the Belle experiment is expected to increase the generated data set by a factor of 50. This means that for the planned pixeldetector, which is the closest to the interaction point, the data rates are going to increase to over 20 GB/s. Combined with data generated by the other detectors, this rate is too big to be efficiently send out to offline processing. This is...Go to contribution page
-
Srikanth Sridharan (CERN)13/04/2015, 18:15Track8: Performance increase and optimization exploiting hardware featuresoral presentationThe proposed upgrade for the Large Hadron Collider LHCb experiment at CERN envisages a system of 500 Data sources each generating data at 100 Gbps, the acquisition and processing of which is a challenge even for state of the art FPGAs. This challenge splits into two, the Data Acquisition (DAQ) part and the Algorithm acceleration part, the later not necessarily immediately following the former....Go to contribution page
-
Rainer Schwemmer (CERN)14/04/2015, 16:30Track8: Performance increase and optimization exploiting hardware featuresoral presentationFor Run 2 of the LHC, LHCb is exchanging a significant part of its event filter farm with new compute nodes. For the evaluation of the best performing solution, we have developed a method to convert our high level trigger application into a stand-alone, bootable benchmark image. With additional instrumentation we turned it into a self-optimising benchmark which explores techniques such as late...Go to contribution page
-
Daniel Hugo Campora Perez (CERN)14/04/2015, 16:45Track8: Performance increase and optimization exploiting hardware featuresoral presentationDuring the data taking process in the LHC at CERN, millions of collisions are recorded every second by the LHCb Detector. The LHCb "Online" computing farm, counting around 15000 cores, is dedicated to the recontruction of the events in real-time, in order to filter those with interesting Physics. The ones kept are later analysed "Offline" in a more precise fashion on the Grid. This imposes...Go to contribution page
-
Ms Bowen Kan (Institute of High Physics Chinese Academy of Sciences)14/04/2015, 17:00Track8: Performance increase and optimization exploiting hardware featuresoral presentationScheduler is one of the most important components of high performance cluster. This paper introduces a self-adaptive dispatching system (SAPS) based on torque/maui which increases the resources utilization of cluster effectively and guarantees the high reliability of the computing platform. It provides great convenience for users to run various tasks on the computing platform. First of all,...Go to contribution page
-
Mr Giulio Eulisse (Fermi National Accelerator Lab. (US))14/04/2015, 17:15Track8: Performance increase and optimization exploiting hardware featuresoral presentationPower consumption will be a key constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics (HEP). This makes performance-per-watt a crucial metric for selecting cost-efficient computing solutions. For this paper, we have done a wide survey of current and emerging architectures becoming available on the market including x86-64 variants, ARMv7...Go to contribution page
-
Jakob Blomer (CERN)14/04/2015, 17:30Track8: Performance increase and optimization exploiting hardware featuresoral presentationMost high-energy physics analysis jobs are embarrassingly parallel except for the final merging of the output objects, which are typically histograms. Currently, the merging of output histograms scales badly. The running time for distributed merging depends not only on the overall number of bins but also on the number partial histogram output files. That means, while the time to analyze data...Go to contribution page
-
Max Fischer (KIT - Karlsruhe Institute of Technology (DE))14/04/2015, 17:45Track8: Performance increase and optimization exploiting hardware featuresoral presentationWith the second run period of the LHC, high energy physics collaborations will have to face increasing computing infrastructural needs. Opportunistic resources are expected to absorb many computationally expensive tasks, such as Monte Carlo event simulation. This leaves dedicated HEP infrastructure with an increased load of analysis tasks that in turn will need to process an increased volume...Go to contribution page
-
Mr Pawel Szostek (CERN)14/04/2015, 18:00Track8: Performance increase and optimization exploiting hardware featuresoral presentationAs Moore's Law drives the silicon industry towards higher transistor counts, processor designs are becoming more and more complex. The area of development includes core count, execution ports, vector units, uncore architecture and finally instruction sets. This increasing complexity leads us to a place where access to the shared memory is the major limiting factor, making feeding the cores...Go to contribution page
-
Christopher Hollowell (Brookhaven National Laboratory)14/04/2015, 18:15Track8: Performance increase and optimization exploiting hardware featuresoral presentationNon-uniform memory access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware...Go to contribution page
-
Vakho Tsulaia (Lawrence Berkeley National Lab. (US))16/04/2015, 11:00Track8: Performance increase and optimization exploiting hardware featuresoral presentationHigh performance computing facilities present unique challenges and opportunities for HENP event processing. The massive scale of many HPC systems means that fractionally small utilizations can yield large returns in processing throughput. Parallel applications which can dynamically and efficiently fill any scheduling opportunities the resource presents benefit both the facility (maximal...Go to contribution page
-
Sergey Panitkin (Brookhaven National Laboratory (US))16/04/2015, 11:15Track8: Performance increase and optimization exploiting hardware featuresoral presentationThe PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment.ย ย While PanDA currently uses more than 100,000 cores at well over 100 Grid sites with a peak performance of 0.3 petaFLOPS, next LHC data taking run will require more resources than Grid computing can possibly...Go to contribution page
-
Dr David Chamont (LLR - รcole polytechnique)16/04/2015, 11:30Track8: Performance increase and optimization exploiting hardware featuresoral presentationThe Matrix Element Method (MEM) is a well known powerful approach in particle physics to extract maximal information of the events arising from the LHC pp collisions. Compared to other methods requiring trainings, the MEM allows direct comparisons between a theory and the observation. Since the phase space has a higher dimensionality to explore, MEM is much more CPU time consuming at the...Go to contribution page
-
Taylor Childers (Argonne National Laboratory (US))16/04/2015, 11:45Track8: Performance increase and optimization exploiting hardware featuresoral presentationDemand for Grid resources is expected to double during LHC Run II as compared to Run I; the capacity of the grid, however, will not double. The HEP community must consider how to bridge this computing gap. Two approaches to meeting this demand include targeting larger compute resources, and using the available compute resources as efficiently as possible. Argonneโs Mira, the fifth fastest...Go to contribution page