Thematic CERN School of Computing 2017

Name: Thematic CERN School of Computing 2017
Start: 2017-06-04T15:00:00+02:00
End: 2017-06-10T14:00:00+02:00
Location: MEDILS in Split, Croatia

4–10 Jun 2017

MEDILS in Split, Croatia

Europe/Zurich timezone

Contact

computing.school@cern.ch

Academic Programme

Efficient Parallel Processing of Future Scientific Data

Introduction

Future scientific data processing: challenges in HEP and other sciences, commonalities and differences.
The prime role of software in modern big science.
Parallelism and asynchronism: computation and I/O.
Evolution of hardware and platforms, consequences on data analysis procedures and tools.

Track 1: Technologies and Platforms

Introduction to Efficient Computing

The evolution of computing hardware and what it means in practice
The seven dimensions of performance
Controlling and benchmarking your computer and software
Software that scales with the hardware
Advanced performance tuning in hardware

Intermediate Concepts in Efficient Computing

Memory architectures, hardware caching and NUMA
Scaling out: Big Data – Big Hardware
The role of compilers and VMs
A brief look at accelerators and heterogeneity

Data Oriented Design

Hardware vectorization in detail – theory vs. practice
Software design for vectorization and smooth data flow
How can compilers and other tools help?

Summary and Future Technologies Overview

Teaching program summary and wrap-up
Next-generation memory technologies and interconnect
Rack-sized datacenters and future computing evolution
Software technologies – forecasts

Track 2: Programming for concurrency and correctness

Scientific software programming: a modern approach

Introduction: Amdahl's law, Performance and correctness of codebases
Modern C++: new constructs, their advantages
Exploit modern architectures using Python
Near the hardware: the role of compilers
Understanding the differences and commonalities of data structures, metrics for their classification, concrete examples

Expressing Parallelism Pragmatically

Trivial asynchronous execution
Task and data decomposition
Threads and the thread pool model
In depth comparison of threads and processes, guidelines to choose the best option

Protection of Resources and Thread Safety

The problem of synchronization
Useful design principles
Replication, atomics, transactions and locks
Lock-free programming techniques
Functional programming style and elements of map-reduce
Third party libraries and high level solutions

Ensure Correctness of a Parallel Scientific Application

Correctness and reproducibility of a scientific result
Stability of results and testing: regression, physics performance, tradeoffs
Enforce avoiding thread unsafe constructs: focus on static analysis
Algorithms for detecting synchronisation pathologies: focus on the DRD and Helgrind tools
Elements of the GNU debugger: introduction and specific usage in the multithreaded case

Track 3: Effective I/O for Scientific Applications

Structuring data for efficient I/O

Pro/cons of row-column and mixed formats
compression and its efficiency dependencies on variable types, impact of data format
Data addressing : limitation of hierarchical approach, usage of flat namespaces
Stateful vs stateless interfaces for namespaces and I/O

Many ways to store data

Storage devices and their specificities
Data federation
Parallelizing files storage
Introduction to the Map/Reduce pattern

Preserving Data

Risks of data loss and corruption
Data consistency (checksumming)
Data safety (redundancy, parity, erasure coding)

Key Ingredients to achieve effective I/O

Asynchronous I/O
I/O optimizations
Caching
Influence of data structures on I/O efficiency

Choose timezone