Speaker
Description
CERNBox, CERN’s cloud collaboration platform, currently serves more than 27,000 users worldwide and manages over 4.1 billion files across multiple petabytes of data. Behind this service, EOS HPM (EOS Home-Project-Media) provides a large-scale, multipetabyte storage infrastructure that enables reliable access to both personal and project spaces.
This presentation reviews the current infrastructure, available resources, and the recent evolution of EOS HPM in production. We discuss the progressive upgrade path from EOS 5.2.x to 5.4.0, the transition to an MQ-less (Pub/Sub) architecture, and a major refurbishment campaign involving disk server decommissioning in MDC and new node deployments in PDC.
Operational improvements are also presented, including the redesign of the redirector architecture, the deployment of EOSBACKUP, the evolution of the quota model, monitoring enhancements, and improved observability.
Finally, we review operational incidents such as OOM events, SSD failures, and redirector outages, as well as the lessons learned from them.