13–17 May 2024
DESY
Europe/Zurich timezone

Cloud Data Lake Technologies

15 May 2024, 12:15
20m
Hoersaal (DESY)

Hoersaal

DESY

Talk HSF

Speaker

Benjamin Galewsky (Univ. Illinois at Urbana Champaign (US))

Description

Cloud data lake technologies have been used successfully in industry for analysis of exabyte scale datasets. The technologies that underly this architecture are

  • Object Store
  • Parquet file format
  • Kubernetes
  • Distributed SQL

We will describe our work using a Trino distributed SQL engine to join selected event data with inference results. We will show how this architecture can eliminate the need to maintain analysis specific copies of datasets.

Requested talk length 20

Author

Benjamin Galewsky (Univ. Illinois at Urbana Champaign (US))

Co-author

Nick Manganelli (University of Colorado Boulder (US))

Presentation materials