Speaker
Description
Galaxy surveys require support from massive datasets in order to achieve precision estimations of cosmological parameters. The CosmoHub platform and SciPIC pipeline have been developed at the Port d'Informació Científica (PIC) to provide this support, achieving nearly interactive performance in the processing of multi-Terabyte datasets. Cosmology projects currently supported include ESA's Euclid space mission, the Dark Energy Survey (DES), the Physics of the Accelerating Universe (PAU) survey and the Marenostrum Institut de Ciències de l'Espai Simulations (MICE). Support for additional projects can be added as needed. CosmoHub (https://cosmohub.pic.es) is a web portal to perform interactive analysis of massive cosmological data. It enables users to interactively explore and distribute data without any SQL knowledge. It is built on top of Apache Hive, part of the Apache Hadoop ecosystem, which facilitates reading, writing, and managing large datasets. More than two billion objects, from public and private data, as well as observed and simulated data, are available. Over 400 users have produced over the last three years about 1500 custom catalogs occupying 2TB in compressed format. All those datasets can be interactively explored using an integrated visualization tool. The current implementation allows an interactive analysis of billion object datasets to complete in less than 25 seconds. The SciPIC scientific pipeline has been developed to efficiently generate mock galaxy catalogs using as input a dark matter halo population. It runs on top of the Hadoop platform using Apache Spark, which is an open-source cluster-computing framework. The pipeline is currently being calibrated to populate the full sky Flagship dark matter halo catalog produced by the University of Zürich, which containins about 44 billion dark matter haloes in a box size of 3.78 Gpc/h. The resulting mock galaxy catalog is directly saved in the CosmoHub platform.
Experimental Collaboration | Euclid, PAU, DES, MICE |
---|