3-9 June 2018
MEDILS in Split, Croatia
Europe/Zagreb timezone

Academic Programme

High Throughput Distributed Processing of Future HEP Data


  • The challenges of HEP data processing in the post upgrade scenarios.
  • Scientific software as the key to achieve the deliverables of the (HL-)LHC Physics Programme
  • Parallelism, performance and programming models for exploitation of resources on a single box or on a cluster.
  • The central role of data management, input and output.
  • Evolution of hardware and platforms and their requirements on data analysis and tools.

Track 1: Technologies and Platforms

(4h lectures + 3h exercises)

Introduction to Efficient Computing

  • The evolution of computing hardware and what it means in practice
  • The seven dimensions of performance
  • Controlling and benchmarking your computer and software
  • Software that scales with the hardware
  • Advanced performance tuning in hardware

Intermediate Concepts in Efficient Computing

  • Memory architectures, hardware caching and NUMA
  • Scaling out: Big Data – Big Hardware
  • The role of compilers and VMs
  • A brief look at accelerators and heterogeneity

Data Oriented Design

  • Hardware vectorization in detail – theory vs. practice
  • Software design for vectorization and smooth data flow
  • How can compilers and other tools help?

Summary and Future Technologies Overview

  • Teaching program summary and wrap-up
  • Next-generation memory technologies and interconnect
  • Rack-sized data centres and future computing evolution
  • Software technologies – forecasts

Track 2: Parallel and optimised scientific software development

(5h  lectures + 4h exercises)

The Challenges of LHC Run III and HL-LHC 

  • HEP data processing: from acquisition to analysis
  • The upgrades of the LHC detectors and of the accelerators
  • Upgrades: challenges of the new dataset and implications for scientific software
  • Commonalities and differences with other disciplines such as genomics, plasma physics, astronomy

Scientific software programming: a modern approach

  • Introduction: Amdahl's law, Performance and correctness of codebases
  • Modern C++: new constructs, their advantages
  • Exploit modern architectures using Python
  • Near the hardware: the role of compilers
  • Understanding the differences and commonalities of data structures, metrics for their classification, concrete examples

Expressing Parallelism Pragmatically

  • Trivial asynchronous execution
  • Task and data decomposition
  • Threads and the thread pool model
  • In depth comparison of threads and processes, guidelines to choose the best option

Protection of Resources and Thread Safety

  • The problem of synchronization
  • Useful design principles
  • Replication, atomics, transactions and locks
  • Lock-free programming techniques
  • Functional programming style and elements of map-reduce
  • Third party libraries and high level solutions

Optimisation of an existing, production grade large codebase

  • How to measure performance. Key indicators, tools and their pros and contras
  • The nightmare of thread safety
  • Data structures for performant computation in modern C++
  • What to expect from vectorization of existing code

Track 3: Effective I/O for Scientific Applications

(3h  lectures + 2h exercises)

Many ways to store data

  • Storage devices and their specificities
  • Data federation  
  • Parallelizing files storage
  • Introduction to the Map/Reduce pattern

Preserving Data

  • Risks of data loss and corruption
  • Data consistency (checksumming)
  • Data safety (redundancy, parity, erasure coding)

Key Ingredients to achieve effective I/O

  • Asynchronous I/O
  • I/O optimizations
  • Caching
  • Influence of data structures on I/O efficiency
Your browser is out of date!

Update your browser to view this website correctly. Update my browser now