Speaker
Description
The Bookkeeping application is the central logbook and state-tracking system of the ALICE experiment at CERN’s Large Hadron Collider, serving detector operations, data taking, and analysis workflows across Run 3, and the forthcoming Long Shutdown 3 (LS3) and Run 4. While its initial design addressed requirements anticipated before Run 3, operational experience and extended use by both synchronous and asynchronous teams have revealed a considerably broader and more demanding set of needs. Thus, Bookkeeping has evolved from a traditional digital logbook into a mission-critical coordination service. It now integrates information from multiple software and hardware components, supports automated and manual logging, and produces standardized End-of-Shift reports and run- and fill-level statistics. A key operational challenge that emerged during Run 3 is the high concurrency pattern at the start of each run, where more than 20,000 requests are issued within seconds as ALICE Online-Offline software components register their processes and configuration states. These challenges have driven substantial architectural evolution, including a transition from predominantly HTTP-based interactions to a multi-tiered client ecosystem comprising of authenticated gRPC servers, C++ wrapper libraries for native components, and a Kafka-based event stream used both to consume and to produce operational messages, enabling more responsive, live interactions between Bookkeeping and software services. This contribution documents the changes introduced during Run 3 and future changes that are to be integrated in LS3, detailing how shifting usage patterns, expanded metadata requirements, and new performance constraints drove substantial functional and technical developments. We conclude with the design decisions and ongoing improvements that will ensure Bookkeeping remains robust, scalable, and extensible.