10–15 Mar 2019
Steinmatte conference center
Europe/Zurich timezone

Setting up technical requirements for an astroparticle data life cycle

Not scheduled
20m
Steinmatte conference center

Steinmatte conference center

Hotel Allalin, Saas Fee, Switzerland https://allalin.ch/conference/
Poster Track 1: Computing Technology for Physics Research Poster Session

Speaker

Victoria Tokareva (Karlsruhe Institute of Technology (KIT))

Description

Increasing data rates opens up new opportunities for astroparticle physics by improving the precision of data analysis and by deploying advanced analysis techniques that demand relatively large data volumes, e.g. deep learning . One of the ways to increase statistics is to combine data from different experimental setups for joint analysis. Moreover, such data integration provides us with an opportunity to carry out multi-messenger astroparticle studies and to search for hidden patterns within the data.

A data life cycle (DLC), namely the data processing pipeline, that is clearly defined and maximally automated at each step, from receiving data to obtaining final results of the analysis, allows us to facilitate and speed up data processing and to make the calculations more reliable and reproducible at each stage. The German-Russian Astroparticle Data Life Cycle Initiative (GRADLCI) is aimed to develop such DLC for combined analysis of data from the KASCADE-Grande (Karlsruhe, Germany) and Tunka-133 (Tunka valley, Russia) experiments. The important features of astroparticle DLC include scalability for handling large amounts of data, heterogeneous data integration, and exploiting parallel and distributed computing at every possible stage of the data processing. This demands special technical requirements necessary to perform the analysis in the hardware and software environment of a computing cluster.

In the talk we discuss the plans on DLC organization that are being implemented in the GRADLCI. This include accelerating the KASCADE-Grande database by the use of end-to-end indexing; developing a database for TAIGA/Tunka-133; organizing fast search within distributed data with a help of a proxy server and metadata database; developing access infrastructure and interfaces for scientists and general public to interact with data; performing a joint data analysis; configuring DLC for safe handling client analysis requests on the server side; and maximum automation of distributed run-wise simulations. The talk addresses the choice of a distributed data storage system, and of virtualization tools.

Primary authors

Victoria Tokareva (Karlsruhe Institute of Technology (KIT)) Andreas Haungs Dmitriy Kostunin (KIT)

Presentation materials

Peer reviewing

Paper