12–16 Sept 2022
Europe/Zurich timezone

Dask Tutorial

16 Sept 2022, 14:00
1h

Speaker

Doug Davis

Description

Dask provides a foundation to natively scale Python libraries and applications. Dask collection libraries like dask.array and dask.dataframe mimic the ubiquitous APIs of NumPy and Pandas to parallelize and/or distribute NumPy-like and Pandas-like workflows. The dask.delayed collection supports parallalization of custom algorithms. In this tutorial we will introduce the core Dask collections, the concepts behind them (partitioned objects represented by task graphs), and Dask's distributed execution engine that is compatible with common HEP batch compute systems. Finally, we will introduce recently developed Dask collections that support partitioned and distributed representations of awkward arrays and boost-histogram objects.

Author

Doug Davis

Presentation materials