Jul 5 – 12, 2017
Venice, Italy
Europe/Zurich timezone
Get the schedule and slides on your phone/tablet using the Conference4me app

CosmoHub and SciPIC: Massive cosmological data analysis, distribution and generation using a Big Data platform

Jul 6, 2017, 3:45 PM
15m
Room Amici (Palazzo del Casinò)

Room Amici

Palazzo del Casinò

Parallel Talk Detector R&D and Data Handling Detectors and data handling

Speaker

Dr Jorge Carretero (IFAE-PIC)

Description

Galaxy surveys require support from massive datasets in order to achieve precision estimations of cosmological parameters. The CosmoHub platform and SciPIC pipeline have been developed at the Port d'Informació Científica (PIC) to provide this support, achieving nearly interactive performance in the processing of multi-Terabyte datasets. Cosmology projects currently supported include ESA's Euclid space mission, the Dark Energy Survey (DES), the Physics of the Accelerating Universe (PAU) survey and the Marenostrum Institut de Ciències de l'Espai Simulations (MICE). Support for additional projects can be added as needed. CosmoHub (https://cosmohub.pic.es) is a web portal to perform interactive analysis of massive cosmological data. It enables users to interactively explore and distribute data without any SQL knowledge. It is built on top of Apache Hive, part of the Apache Hadoop ecosystem, which facilitates reading, writing, and managing large datasets. More than two billion objects, from public and private data, as well as observed and simulated data, are available. Over 400 users have produced over the last three years about 1500 custom catalogs occupying 2TB in compressed format. All those datasets can be interactively explored using an integrated visualization tool. The current implementation allows an interactive analysis of billion object datasets to complete in less than 25 seconds. The SciPIC scientific pipeline has been developed to efficiently generate mock galaxy catalogs using as input a dark matter halo population. It runs on top of the Hadoop platform using Apache Spark, which is an open-source cluster-computing framework. The pipeline is currently being calibrated to populate the full sky Flagship dark matter halo catalog produced by the University of Zürich, which containins about 44 billion dark matter haloes in a box size of 3.78 Gpc/h. The resulting mock galaxy catalog is directly saved in the CosmoHub platform.

Experimental Collaboration Euclid, PAU, DES, MICE

Primary authors

Dr Jorge Carretero (IFAE-PIC) Mr Pau Tallada (CIEMAT-PIC) Mr Jordi Casals (CIEMAT-PIC) Mr Marc Caubet (CIEMAT-PIC) Dr Francisco Castander (Institut de Ciències de l’Espai, IEEC-CSIC) Dr Linda Blot (Institut de Ciències de l’Espai, IEEC-CSIC) Mr Alex Alarcón (Institut de Ciències de l’Espai, IEEC-CSIC)

Co-authors

Mr Santi Serrano (Institut de Ciències de l’Espai, IEEC-CSIC) Dr Pablo Fosalba (Institut de Ciències de l’Espai, IEEC-CSIC) Dr Carles Acosta (IFAE-PIC) Dr Nadia Tonello (IFAE-PIC) Mr Francesc Torradeflot (IFAE-PIC) Dr Christian Neissner (IFAE-PIC) Prof. Manuel Delfino (UAB, IFAE-PIC)

Presentation materials