3–9 Jun 2018
MEDILS in Split, Croatia
Europe/Zagreb timezone

Academic programme

High Throughput Distributed Processing of Future HEP Data

Introduction

  • The challenges of HEP data processing in the post upgrade scenarios.
  • Scientific software as the key to achieve the deliverables of the (HL-)LHC Physics Programme
  • Parallelism, performance and programming models for exploitation of resources on a single box or on a cluster.
  • The central role of data management, input and output.
  • Evolution of hardware and platforms and their requirements on data analysis and tools.

Track 1: Technologies and Platforms

(4h lectures + 4h exercises)

"Introduction to Efficient Computing" by Andrzej Nowak

  • The evolution of computing hardware and what it means in practice
  • The seven dimensions of performance
  • Controlling and benchmarking your computer and software
  • Software that scales with the hardware
  • Advanced performance tuning in hardware

"Intermediate Concepts in Efficient Computing" by Andrzej Nowak

  • Memory architectures, hardware caching and NUMA
  • Scaling out: Big Data – Big Hardware
  • The role of compilers and VMs
  • A brief look at accelerators and heterogeneity

"Data Oriented Design" by Andrzej Nowak

  • Hardware vectorization in detail – theory vs. practice
  • Software design for vectorization and smooth data flow
  • How can compilers and other tools help?

"Summary and Future Technologies Overview" by Andrzej Nowak

  • Teaching program summary and wrap-up
  • Next-generation memory technologies and interconnect
  • Rack-sized data centres and future computing evolution
  • Software technologies – forecasts

Track 2: Parallel and optimised scientific software development

(5h  lectures + 6h exercises)

"Computational Challenges of Run III and HL-LHC" by Danilo Piparo

  • HEP data processing: from acquisition to analysis
  • The upgrades of the LHC detectors and of the accelerators
  • Upgrades: challenges of the new dataset and implications for scientific software
  • Commonalities and differences with other disciplines such as genomics, plasma physics, astronomy

"Scientific programming: a modern approach" by Danilo Piparo

  • Introduction: Amdahl's law, Performance and correctness of codebases
  • Modern C++: new constructs, their advantages
  • Exploit modern architectures using Python
  • Near the hardware: the role of compilers
  • Understanding the differences and commonalities of data structures, metrics for their classification, concrete examples

"Expressing Parallelism Pragmatically" by Danilo Piparo

  • Trivial asynchronous execution
  • Task and data decomposition
  • Threads and the thread pool model
  • In depth comparison of threads and processes, guidelines to choose the best option

"Protection of Resources and Thread Safety" by Danilo Piparo

  • The problem of synchronization
  • Useful design principles
  • Replication, atomics, transactions and locks
  • Lock-free programming techniques
  • Functional programming style and elements of map-reduce
  • Third party libraries and high level solutions

"Optimizing existing large codebase" by Sebastien Ponce

  • How to measure performance. Key indicators, tools and their pros and contras
  • The nightmare of thread safety
  • Data structures for performant computation in modern C++
  • What to expect from vectorization of existing code

Track 3: Effective I/O for Scientific Applications

(3h  lectures + 2h exercises)

"Many ways to store data" by Sebastien Ponce

  • Storage devices and their specificities
  • Data federation  
  • Parallelizing files storage
  • Introduction to the Map/Reduce pattern

"Preserving Data" by Sebastien Ponce

  • Risks of data loss and corruption
  • Data consistency (checksumming)
  • Data safety (redundancy, parity, erasure coding)

"Key Ingredients to achieve effective I/O" by Sebastien Ponce

  • Asynchronous I/O
  • I/O optimizations
  • Caching
  • Influence of data structures on I/O efficiency