Thematic CERN School of Computing 2018

Name: Thematic CERN School of Computing 2018
Start: 2018-06-03T11:00:00+02:00
End: 2018-06-09T12:00:00+02:00
Location: MEDILS in Split, Croatia

3–9 Jun 2018

MEDILS in Split, Croatia

Europe/Zagreb timezone

Contact

Computing.School@cern.ch

Academic programme

High Throughput Distributed Processing of Future HEP Data

Introduction

The challenges of HEP data processing in the post upgrade scenarios.
Scientific software as the key to achieve the deliverables of the (HL-)LHC Physics Programme
Parallelism, performance and programming models for exploitation of resources on a single box or on a cluster.
The central role of data management, input and output.
Evolution of hardware and platforms and their requirements on data analysis and tools.

Track 1: Technologies and Platforms

(4h lectures + 4h exercises)

"Introduction to Efficient Computing" by Andrzej Nowak

The evolution of computing hardware and what it means in practice
The seven dimensions of performance
Controlling and benchmarking your computer and software
Software that scales with the hardware
Advanced performance tuning in hardware

"Intermediate Concepts in Efficient Computing" by Andrzej Nowak

Memory architectures, hardware caching and NUMA
Scaling out: Big Data – Big Hardware
The role of compilers and VMs
A brief look at accelerators and heterogeneity

"Data Oriented Design" by Andrzej Nowak

Hardware vectorization in detail – theory vs. practice
Software design for vectorization and smooth data flow
How can compilers and other tools help?

"Summary and Future Technologies Overview" by Andrzej Nowak

Teaching program summary and wrap-up
Next-generation memory technologies and interconnect
Rack-sized data centres and future computing evolution
Software technologies – forecasts

Track 2: Parallel and optimised scientific software development

(5h lectures + 6h exercises)

"Computational Challenges of Run III and HL-LHC" by Danilo Piparo

HEP data processing: from acquisition to analysis
The upgrades of the LHC detectors and of the accelerators
Upgrades: challenges of the new dataset and implications for scientific software
Commonalities and differences with other disciplines such as genomics, plasma physics, astronomy

"Scientific programming: a modern approach" by Danilo Piparo

Introduction: Amdahl's law, Performance and correctness of codebases
Modern C++: new constructs, their advantages
Exploit modern architectures using Python
Near the hardware: the role of compilers
Understanding the differences and commonalities of data structures, metrics for their classification, concrete examples

"Expressing Parallelism Pragmatically" by Danilo Piparo

Trivial asynchronous execution
Task and data decomposition
Threads and the thread pool model
In depth comparison of threads and processes, guidelines to choose the best option

"Protection of Resources and Thread Safety" by Danilo Piparo

The problem of synchronization
Useful design principles
Replication, atomics, transactions and locks
Lock-free programming techniques
Functional programming style and elements of map-reduce
Third party libraries and high level solutions

"Optimizing existing large codebase" by Sebastien Ponce

How to measure performance. Key indicators, tools and their pros and contras
The nightmare of thread safety
Data structures for performant computation in modern C++
What to expect from vectorization of existing code

Track 3: Effective I/O for Scientific Applications

(3h lectures + 2h exercises)

"Many ways to store data" by Sebastien Ponce

Storage devices and their specificities
Data federation
Parallelizing files storage
Introduction to the Map/Reduce pattern

"Preserving Data" by Sebastien Ponce

Risks of data loss and corruption
Data consistency (checksumming)
Data safety (redundancy, parity, erasure coding)

"Key Ingredients to achieve effective I/O" by Sebastien Ponce

Asynchronous I/O
I/O optimizations
Caching
Influence of data structures on I/O efficiency

Choose timezone