2nd CERN Advanced Performance Tuning workshop

Name: 2nd CERN Advanced Performance Tuning workshop
Start: 2013-11-21T09:00:00+01:00
End: 2013-11-22T18:00:00+01:00
Location: CERN

21 Nov 2013, 09:00 → 22 Nov 2013, 18:00 Europe/Zurich

593/R-011 (CERN)

593/R-011

CERN

Show room on map

Description

During this two day workshop focused on Advanced Performance Tuning, the latest trends in tuning will be presented by industry leaders ARM, Calxeda, Google and Intel.

The participants will have a unique opportunity to interact with architects, software experts and members of teams responsible for mainstream tools such as linux-perf and Intel VTune Amplifier.

A detailed agenda will be announced in the upcoming days and will include lectures as well as an opportunity for hands-on labs. The speakers include:

Stas Bratanov (Intel), VTune team
Andres S. Charif-Rubial (Versailles Exascale Computing lab), tuning expert
Mike Chynoweth (Intel), PBA/xIF team, tuning expert
Maria Dimakopoulou (Google), performance engineer
Stephane Eranian (Google), senior engineer for linux-perf, libpfm, perfmon2 author
Al Grant (ARM), performance expert
Vincenzo Innocente (CERN), performance and software expert
David Levinthal (Google, remote), GooDA author and tuning expert
Andrzej Nowak (CERN openlab)
Robert Richter (Calxeda), performance expert: linux-perf, ARM and AMD systems
Mike Williams (ARM), debug and PMU architect
Ahmad Yasin (Intel), PMU architect

If you would like the experts to take a look at your code in a specific tool, please prepare an SLC6 compatible package in advance, or have your own system ready. Openlab will provide dual-socket Ivy Bridge systems - to get advance access, please get in touch with Pawel Szostek.

Organizer: Andrzej Nowak, CERN openlab CTO office / PCC

To request your place, please click the "Apply" link below - due to limited space participation will be confirmed.

Thursday, 21 November
- Thu, 21 Nov
- Fri, 22 Nov
- 09:00 → 10:15
  Talks
  - 09:00
    
    Introduction 30m
    
    Speaker: Andrzej Nowak (CERN openlab)
    
    Slides
  - 09:30
    
    Physics Software and Tuning Challenges + Discussion 45m
    
    Speaker: Vincenzo Innocente (CERN)
    
    Slides
- 10:15 → 10:45
  
  Break: Coffee break
- 10:45 → 12:10
  Talks
  - 10:45
    
    An update on perf_events 30m
    
    In this talk, we give an overview of the latest developmements in the Linux kernel performance monitoring interface, perf_events, and related tools, such as perf. In particular, we describe load/store sampling support, event grouping, multi event profiling, Energy consumption, uncore counters, Haswell processor support.
    
    Speaker: Stephane Eranian (Google)
    
    Slides
  - 11:15
    
    Improving perf_events measurement correctness 10m
    
    In this presentation, we talk about a correctness issue in the Performance Monitoring Unit (PMU) of recent Intel processors with Hyperthreading enabled. This issue introduces cross-threading corruption when hyperthreads measure incompatible events on sibling counters. As such, certain event combinations may produce unreliable results. We present an innovative approach to avoid the problem by introducing cross-thread dynamic event scheduling. We conclude with the implemented protocol's results and the challenges it raises.
    
    Speaker: Maria Dimakopoulou (Google)
    
    Slides
  - 11:25
    
    RAS and memory error reporting with perf 30m
    
    Strategies for RAS (Reliability, availability and serviceability) are necessary for enterprise systems in order to increase data integrity and system uptime. The current implementations in the Linux kernel to collect hardware errors are architecture dependent or even vendor specific. In order to unify hardware error reporting over architectures a new approach is needed. The talk shows how the perf event subsystem can be used for this. It also gives details about perf persistent events that keep running in the system after the creating process terminated.
    
    Speaker: Robert Richter (Calxeda)
    
    Slides
  - 11:55
    
    AMD IBS and northbridge counters in perf 15m
    
    Speaker: Robert Richter (Calxeda)
    
    Slides
- 12:10 → 14:00
  
  Break: Lunch (on your own)
- 14:00 → 15:30
  Talks
  - 14:00
    
    MAQAO: an analysis and optimization toolchain 30m
    
    MAQAO (Modular Assembly Quality Analyzer and Optimizer) analyzes binary codes and provides application developers with reports to optimize their code. The tool mixes both static code quality evaluation, and dynamic profiling and characterization. This is based on the ability to reconstruct low-level and high-level structures, such as basic blocks, loops, functions, and call-sites. Another main feature of MAQAO is its extensibility. Users can easily write their own plug-ins, using the embedded scripting language Lua. It allows fast prototyping of new tools based on MAQAO. We will present the three currently released modules. The first one is the profiler module (PERFEVAL) which aim is to detect function and loop hotspots. The second one is the code quality analyzer tool (CQA) which evaluates the code generated by a given compiler. Finally, we will present MIL, our binary instrumentation language.
    
    Speaker: Andres S. Charif-Rubial (Exascale Computing Research Laboratory, Versailles)
    
    Slides
  - 14:30
    
    Intel VTune Amplifier: A Bridge to Performance, Parallelism, and Power (introduction to hardware collection) 20m
    
    We’re going to present Intel VTune Amplifier XE as Intel’s flagship performance analysis product and focus on a few aspects of HW-assisted SW analysis, including performance, power and threading efficiency, in the example of an N-body application run on both CPU and GPU. Additionally, in the process of our case study we will identify opportunities for further improvement of the tool and will ask the audience to share their opinions. CPU analysis is expected to be covered in the talk, while the GPU side will be left for offline studying because of time constraints.
    
    Speaker: Stas Bratanov (Intel)
    
    Slides
  - 14:50
    
    Top Down Analysis – Never Lost With Perf Counters 40m
    
    Optimizing an application’s performance for a given microarchitecture has become painfully difficult. Increasing microarchitecture complexity, workload diversity, and the unmanageable volume of data produced by performance tools increase the optimization challenges. At the same time resource and time constraints get tougher with recently emerged segments. This calls for accurate and prompt analysis methods, adding further to the difficulty. Top-Down Analysis is a practical method to quickly identify true bottlenecks in out-of-order processors. The presented method uses designated performance counters in a structured hierarchical approach in order to quickly and, more importantly, correctly identify dominant performance bottlenecks. The developed method is adopted by multiple in-production tools including VTune. Feedback from VTune average users suggests that the analysis is made easier thanks to the simplified hierarchy which avoids the high-learning curve associated with microarchitecture details. Characterization results of this method are reported for the SPEC CPU2006 benchmarks as well as key enterprise workloads. We will walk through field case studies where the method guides field software optimizations, in addition to architectural exploration study for most recent generations of Intel Core products.
    
    Speaker: Ahmad Yasin (Intel)
    
    Slides
- 15:30 → 16:45
  
  Hands-on: Hands-on and discussions
- 17:00 → 18:30
  
  Break: ALICE experiment underground visit (speakers)
- 19:00 → 22:30
  
  Break: Optional dinner
Friday, 22 November
- Thu, 21 Nov
- Fri, 22 Nov
- 08:00 → 09:30
  Talks
  - 08:00
    Intel Architecture and GOODA 1h 30m
    
    Speaker: David Levinthal (Google)
    
    document
    
    Slides
    
    DL-cycle_accounting_and_gooda.pdf
    
    DL-CycleAccountingandPerformanceAnalysis.pdf
    
    DL-Micro-architecture.pdf
- 09:30 → 10:00
  
  Break: Coffee Break
- 10:00 → 11:40
  Talks
  - 10:00
    
    ARM in the server space 15m
    
    Speaker: Robert Richter (Calxeda)
    
    Slides
  - 10:15
    
    Introduction to ARMv8 20m
    
    ARM business model of designing and licensing low-power IP building blocks has been phenomenally successful in transforming the mobile industry. The introduction of the ARMv8 architecture opens up low-power 64-bit computing in the same way. This talk will introduce the ARMv8 architecture and describe the debug and performance monitoring capabilities baked into it. It will give an opportunity to learn more about ARM and ARMv8.
    
    Speaker: Michael Williams (ARM)
    
    Slides
  - 10:35
    
    Software profiling on ARM 20m
    
    This presentation will give a brief overview of ARM's software profiling tools. This will mainly cover the Streamline tool for sampling-based profiling and software instrumentation, but also look at hardware trace.
    
    Speaker: Al Grant (ARM)
    
    Slides
  - 10:55
    
    Utilizing Performance Bottleneck Analyzer to Debug Issues on Intel’s Future SOCs 45m
    
    The presentation will cover the methodologies of the PBA (Performance Bottleneck Analyzer) toolset which has been maintained by Intel engineers for 7+ years to analyze workloads on future architectures. Using examples from software vendors we will show how PBA was able to find and fix issues that could not be identified with any other methodology or toolset on Intel’s future SOCs. PBA recreates very long flows of execution on the processor and then combines knowledge of processor events, power state and static assembly analysis to find and prioritize bottlenecks on Intel’s latest architectures. Filters are then applied to the data set to better call out issues that are impacting user experience, power or slow transactions to ensure the developer concentrates on the right issues to fix their problem. We will also showcase all of the functionalities using field examples from major software vendors. The talk will focus on how the collaborative framework has been used to share methodologies across multiple software vendor accounts and disciplines. We will also focus on some new capabilities allowing us to collect more performance monitoring data for less overhead than previously realized and syncing the run with multiple types of media.
    
    Speaker: Michael W Chynoweth (Intel)
    
    Slides
- 11:40 → 12:30
  
  Hands-on: Discussions and hands-on
- 12:30 → 14:00
  
  Break: Lunch (on your own)

Choose timezone

2nd CERN Advanced Performance Tuning workshop

593/R-011

CERN

Share this page

Direct link

Social networks

Calendaring