Speaker
Description
The LHCb experiment offers an excellent environment to study a broad variety of modern physics topics. The data recorded by LHCb from the major physics campaigns (Run 1 and 2) at the LHC has accumulated over 600 scientific publications, making it increasingly important to preserve analysis workflows to facilitate both reusability and reinterpretation of the results. LHCb encourages preservation of data and analysis workflows from the point the data is read out by the detector to the end results shown in publications, with options to produce Ntuples in a way that preserves the data provenance, and with extensive use of workflow management systems like Snakemake.
Such valuable and complex data merits careful thought on how to preserve and provide open access to a broader community of researchers. In accordance with the CERN Open Data Policy, LHCb announced the release of the full Run 1 dataset gathered from proton-proton collisions, amounting to approximately 800 terabytes made public on the CERN Open Data portal in 2023. However, due to the large amount of data collected during Run 2, it is no longer feasible to make the reconstructed data accessible to the public in the same way. This prompted the development of an innovative approach to publishing open data by means of a dedicated LHCb Ntupling Service, allowing third-party users to query the data collected by LHCb and request custom samples in the form of Ntuples. These Ntuples can be individually customized in the web interface of the Ntupling Service application using LHCb standard tools for saving measured or derived quantities of interest. The procedure of requesting and subsequently analyzing an Ntuple requires no specific knowledge of the LHCb software stack.
The LHCb Ntupling Service was developed as a collaborative effort by LHCb and the CERN Open Data team from the CERN Department of Information Technology. The service consists of the web interface frontend allowing users to create and review Ntuple production requests, the backend application processing the user requests and storing them in the GitLab repositories, offering vetting capabilities to the LHCb Open Data team, and automatic dispatch of user requests to the LHCb Ntuple production systems after approval. The produced Ntuples are then collected and delivered back to the users in the frontend web interface.