May 12 – 18, 2019
Split, Croatia
Europe/Zagreb timezone

Academic programme

High Throughput Distributed Processing of Future HEP Data

Introduction

  • The challenges of HEP data processing in the post upgrade scenarios.
  • Scientific software as the key to achieve the deliverables of the (HL-)LHC Physics Programme
  • Parallelism, performance and programming models for exploitation of resources on a single box or on a cluster.
  • The central role of data management, input and output.
  • Evolution of hardware and platforms and their requirements on data analysis and tools.

Track 1: Technologies and Platforms

(4h lectures + 4h exercises)

"Introduction to efficient computing" by Andrzej Nowak

  • The evolution of computing hardware and what it means in practice
  • The seven dimensions of performance
  • Controlling and benchmarking your computer and software
  • Software that scales with the hardware
  • Advanced performance tuning in hardware

"Intermediate concepts in efficient computing" by Andrzej Nowak

  • Memory architectures, hardware caching and NUMA
  • Scaling out: Big Data – Big Hardware
  • The role of compilers and VMs
  • A brief look at accelerators and heterogeneity

"Data-oriented design" by Andrzej Nowak

  • Hardware vectorization in detail – theory vs. practice
  • Software design for vectorization and smooth data flow
  • How can compilers and other tools help?

"Summary and future technologies overview" by Andrzej Nowak

  • Teaching program summary and wrap-up
  • Next-generation memory technologies and interconnect
  • Rack-sized data centres and future computing evolution
  • Software technologies – forecasts

Track 2: Parallel and Optimised Scientific Software Development

(6h  lectures + 6h exercises)

"Computational challenges of run III and HL-LHC" by Danilo Piparo

  • HEP data processing: from acquisition to analysis
  • The upgrades of the LHC detectors and of the accelerators
  • Upgrades: challenges of the new dataset and implications for scientific software
  • Commonalities and differences with other disciplines such as genomics, plasma physics, astronomy

"Scientific programming: a modern approach" by Danilo Piparo

  • Introduction: Amdahl's law, Performance and correctness of codebases
  • Modern C++: new constructs, their advantages
  • Exploit modern architectures using Python
  • Near the hardware: the role of compilers
  • Understanding the differences and commonalities of data structures, metrics for their classification, concrete examples

"Expressing parallelism pragmatically" by Danilo Piparo

  • Trivial asynchronous execution
  • Task and data decomposition
  • Threads and the thread pool model
  • In depth comparison of threads and processes, guidelines to choose the best option

"Protection of resources and thread safety" by Danilo Piparo

  • The problem of synchronization
  • Useful design principles
  • Replication, atomics, transactions and locks
  • Lock-free programming techniques
  • Functional programming style and elements of map-reduce
  • Third party libraries and high level solutions

"Optimizing existing large codebase" by Sebastien Ponce

  • How to measure performance. Key indicators, tools and their pros and cons
  • The nightmare of thread safety
  • Data structures for performant computation in modern C++

"Pratical vectorization" by Sebastien Ponce 

  • Measuring vectorization level
  • What to expect from vectorization
  • Preparing code for vectorization
  • Vectorizing techniques in C++: intrinsics, libraries, autovectorization

Track 3: Effective I/O for Scientific Applications

(2h  lectures + 2h exercises)

"Data storage and preservation" by Sebastien Ponce

  • Storage devices and their specificities
  • Risks of data loss and corruption
  • Data safety (redundancy, parity, erasure coding)

"Key ingredients to achieve effective I/O" by Sebastien Ponce

  • Asynchronous I/O
  • I/O optimizations
  • Caching
  • Influence of data structures on I/O efficiency