Speaker
Description
The Einstein Telescope (ET) is currently in the early development phase
for its computing infrastructure. At present, the only officially
provided service is the distribution of data for Mock Data Challenges
(using the Open Science Data Federation + CVMFS-for-data), with GitLab
used for code management. While the data distribution infrastructure is
expected to be managed by a Data Lake using Rucio, the specifics of the
data processing infrastructure and tools remain undefined. This
exploratory phase allows for a detailed evaluation of different solutions.
Drawing from the experiences of 2nd-generation gravitational wave
experiments LIGO and Virgo, which began with modest computational needs
and expanded into distributed computing models using HTCondor, ET aims
to build upon these foundations. LIGO and Virgo adopted, for their
offline data analyses, the LHC grid computing model through a common
computing infrastructure called IGWN (International Gravitational-Wave
Observatory Network), incorporating systems like glideinWMS, which works
on top of HTCondor, to handle high-throughput computing (HTC) tasks.
Despite this, challenges such as the reliance on shared file systems
have limited the migration to grid-based workflows, with only 20% of
jobs currently running on the IGWN grid.
For ET, the plan is to adapt and evolve from the IGWN grid computing
model, making sure workflows are grid-compatible. This includes
exploring Snakemake, a framework for reproducible data analysis, to
complement HTCondor. Snakemake offers the ability to run jobs on diverse
computing resources, including grid, Slurm clusters, and cloud-based
infrastructures. This approach aims to ensure flexibility, scalability,
and reproducibility in ET’s data processing workflows, while overcoming
past limitations.
Speaker release | Yes |
---|