Choose timezone

Your profile timezone:

Use timezone based on:

Event/category Custom

Select a custom timezone

Login

21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015)

13–17 Apr 2015

OIST

Asia/Tokyo timezone

Session

Track 8 Session

13 Apr 2015, 16:30

OIST

OIST

1919-1 Tancha, Onna-son, Kunigami-gun Okinawa, Japan 904-0495

Track 8 Session: #1 (GPU and other accelerators)

Niko Neufeld (CERN)

Track 8 Session: #2 (Vectorization, NUMA and distribution)

Danilo Piparo (CERN)

Track 8 Session: #3 (Use of HPC for HEP)

Niko Neufeld (CERN)

There are no materials yet.

321. Fast event generation on graphics processing unit (GPU) and its integration into the MadGraph system.

Dr Junichi Kanzaki (KEK)

13/04/2015, 16:30

Track8: Performance increase and optimization exploiting hardware features

oral presentation

Fast event generation system of physics processes is developed using graphics processing unit (GPU). The system is based on the Monte Carlo integration and event generation programs, BASES/SPRING, which were originally developed in FORTRAN. They were rewritten on the CUDA platform provided by NVIDIA in order for the implementation of these programs to GPUs. Since the Monte Carlo integration...

39. Triggering events with GPUs at ATLAS

Dr Sami Kama (Southern Methodist University (US))

13/04/2015, 16:45

Track8: Performance increase and optimization exploiting hardware features

oral presentation

The growing size and complexity of events produced at the high luminosities expected in 2015 at the Large Hadron Collider demands much more computing power for the online event selection and for the offline data reconstruction than in the previous data taking period. In recent years, the explosive performance growth of low-cost, massively parallel processors like Graphical Processing Units...

413. GPU Accelerated Event-by-event Reweighting for a T2K Neutrino Oscillation Analysis

Richard Calland

13/04/2015, 17:00

Track8: Performance increase and optimization exploiting hardware features

oral presentation

The Tokai-to-Kamioka (T2K) experiment is a second generation long baseline neutrino experiment, which uses a near detector to constrain systematic uncertainties for oscillation measurements with its far detector. Event-by-event reweighting of Monte Carlo (MC) events is applied to model systematic effects and construct PDFs describing predicted event distributions. However when analysing...

87. Fast TPC online tracking on GPUs and asynchronous data-processing in the ALICE HLT to enable online calibration

David Michael Rohr (Johann-Wolfgang-Goethe Univ. (DE))

13/04/2015, 17:15

Track8: Performance increase and optimization exploiting hardware features

oral presentation

ALICE (A Large Heavy Ion Experiment) is one of the four major experiments at the Large Hadron Collider (LHC) at CERN, which is today the most powerful particle accelerator worldwide. The High Level Trigger (HLT) is an online compute farm of about 200 nodes, which reconstructs events measured by the ALICE detector in real-time. The HLT uses a custom online data-transport framework to distribute...

428. Detector Simulation On Modern Coprocessors

Philippe Canal (Fermi National Accelerator Lab. (US))

13/04/2015, 17:30

Track8: Performance increase and optimization exploiting hardware features

oral presentation

The recent prevalence of hardware architectures of many-core or accelerated processors opens opportunities for concurrent programming models taking advantages of both SIMD and SIMT architectures. The Geant Vector Prototype has been designed both to exploit the vector capability of main stream CPUs and to take advantage of Coprocessors including NVidia’s GPU and Intel Xeon Phi. The...

483. Hardware and Software Design of FPGA-based PCIe Gen3 interface for APENet+ network interconnect system

Michele Martinelli (INFN Rome)

13/04/2015, 17:45

Track8: Performance increase and optimization exploiting hardware features

oral presentation

The computing nodes of modern hybrid HPC systems are built using the CPU+GPU paradigm. When this class of systems is scaled to large size, the efficiency of the network connecting GPUs mesh and supporting the internode traffic is a critical factor. The adoption of a low latency, high performance dedicated network architecture, exploiting peculiar characteristics of CPU and GPU hardware,...

52. Online-Analysis of Hits in the Belle-II Pixeldetector for Separation of Slow Pions from Background

Mr Steffen Baehr (Karlsruhe Institute of Technology)

13/04/2015, 18:00

Track8: Performance increase and optimization exploiting hardware features

oral presentation

The impending Upgrade of the Belle experiment is expected to increase the generated data set by a factor of 50. This means that for the planned pixeldetector, which is the closest to the interaction point, the data rates are going to increase to over 20 GB/s. Combined with data generated by the other detectors, this rate is too big to be efficiently send out to offline processing. This is...

488. Evaluation of 'OpenCL for FPGA' for Data Acquisition and Acceleration in High Energy Physics applications

Srikanth Sridharan (CERN)

13/04/2015, 18:15

Track8: Performance increase and optimization exploiting hardware features

oral presentation

The proposed upgrade for the Large Hadron Collider LHCb experiment at CERN envisages a system of 500 Data sources each generating data at 100 Gbps, the acquisition and processing of which is a challenge even for state of the art FPGAs. This challenge splits into two, the Data Acquisition (DAQ) part and the Algorithm acceleration part, the later not necessarily immediately following the former....

247. Performance benchmark of LHCb code on state-of-the-art x86 architectures

Rainer Schwemmer (CERN)

14/04/2015, 16:30

Track8: Performance increase and optimization exploiting hardware features

oral presentation

For Run 2 of the LHC, LHCb is exchanging a significant part of its event filter farm with new compute nodes. For the evaluation of the best performing solution, we have developed a method to convert our high level trigger application into a stand-alone, bootable benchmark image. With additional instrumentation we turned it into a self-optimising benchmark which explores techniques such as late...

268. SIMD studies in the LHCb reconstruction software

Daniel Hugo Campora Perez (CERN)

14/04/2015, 16:45

Track8: Performance increase and optimization exploiting hardware features

oral presentation

During the data taking process in the LHC at CERN, millions of collisions are recorded every second by the LHCb Detector. The LHCb "Online" computing farm, counting around 15000 cores, is dedicated to the recontruction of the events in real-time, in order to filter those with interesting Physics. The ones kept are later analysed "Offline" in a more precise fashion on the Grid. This imposes...

244. A new Self-Adaptive disPatching System for local cluster

Ms Bowen Kan (Institute of High Physics Chinese Academy of Sciences)

14/04/2015, 17:00

Track8: Performance increase and optimization exploiting hardware features

oral presentation

Scheduler is one of the most important components of high performance cluster. This paper introduces a self-adaptive dispatching system (SAPS) based on torque/maui which increases the resources utilization of cluster effectively and guarantees the high reliability of the computing platform. It provides great convenience for users to run various tasks on the computing platform. First of all,...

493. Future Computing Platforms for Science in a Power Constrained Era

Mr Giulio Eulisse (Fermi National Accelerator Lab. (US))

14/04/2015, 17:15

Track8: Performance increase and optimization exploiting hardware features

oral presentation

Power consumption will be a key constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics (HEP). This makes performance-per-watt a crucial metric for selecting cost-efficient computing solutions. For this paper, we have done a wide survey of current and emerging architectures becoming available on the market including x86-64 variants, ARMv7...

376. Large-Scale Merging of Histograms using Distributed In-Memory Computing

Jakob Blomer (CERN)

14/04/2015, 17:30

Track8: Performance increase and optimization exploiting hardware features

oral presentation

Most high-energy physics analysis jobs are embarrassingly parallel except for the final merging of the output objects, which are typically histograms. Currently, the merging of output histograms scales badly. The running time for distributed merging depends not only on the overall number of bins but also on the number partial histogram output files. That means, while the time to analyze data...

477. High performance data analysis via coordinated caches

Max Fischer (KIT - Karlsruhe Institute of Technology (DE))

14/04/2015, 17:45

Track8: Performance increase and optimization exploiting hardware features

oral presentation

With the second run period of the LHC, high energy physics collaborations will have to face increasing computing infrastructural needs. Opportunistic resources are expected to absorb many computationally expensive tasks, such as Monte Carlo event simulation. This leaves dedicated HEP infrastructure with an increased load of analysis tasks that in turn will need to process an increased volume...

101. Evaluating the power efficiency and performance of multi-core platforms using HEP workloads

Mr Pawel Szostek (CERN)

14/04/2015, 18:00

Track8: Performance increase and optimization exploiting hardware features

oral presentation

As Moore's Law drives the silicon industry towards higher transistor counts, processor designs are becoming more and more complex. The area of development includes core count, execution ports, vector units, uncore architecture and finally instruction sets. This increasing complexity leads us to a place where access to the shared memory is the major limiting factor, making feeding the cores...

3. The Effect of NUMA Tunings on CPU Performance

Christopher Hollowell (Brookhaven National Laboratory)

14/04/2015, 18:15

Track8: Performance increase and optimization exploiting hardware features

oral presentation

Non-uniform memory access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware...

140. Fine grained event processing on HPCs with the ATLAS Yoda system

Vakho Tsulaia (Lawrence Berkeley National Lab. (US))

16/04/2015, 11:00

Track8: Performance increase and optimization exploiting hardware features

oral presentation

High performance computing facilities present unique challenges and opportunities for HENP event processing. The massive scale of many HPC systems means that fractionally small utilizations can yield large returns in processing throughput. Parallel applications which can dynamically and efficiently fill any scheduling opportunities the resource presents benefit both the facility (maximal...

152. Integration of PanDA workload management system with Titan supercomputer at OLCF.

Sergey Panitkin (Brookhaven National Laboratory (US))

16/04/2015, 11:15

Track8: Performance increase and optimization exploiting hardware features

oral presentation

The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. While PanDA currently uses more than 100,000 cores at well over 100 Grid sites with a peak performance of 0.3 petaFLOPS, next LHC data taking run will require more resources than Grid computing can possibly...

202. Matrix Element Method for High Performance Computing platforms

Dr David Chamont (LLR - École polytechnique)

16/04/2015, 11:30

Track8: Performance increase and optimization exploiting hardware features

oral presentation

The Matrix Element Method (MEM) is a well known powerful approach in particle physics to extract maximal information of the events arising from the LHC pp collisions. Compared to other methods requiring trainings, the MEM allows direct comparisons between a theory and the observation. Since the phase space has a higher dimensionality to explore, MEM is much more CPU time consuming at the...

536. Simulation of LHC events on a million threads

Taylor Childers (Argonne National Laboratory (US))

16/04/2015, 11:45

Track8: Performance increase and optimization exploiting hardware features

oral presentation

Demand for Grid resources is expected to double during LHC Run II as compared to Run I; the capacity of the grid, however, will not double. The HEP community must consider how to bridge this computing gap. Two approaches to meeting this demand include targeting larger compute resources, and using the available compute resources as efficiently as possible. Argonne’s Mira, the fifth fastest...

Building timetable...