CHEP 2016 Conference, San Francisco, October 8-14, 2016

Name: CHEP 2016 Conference, San Francisco, October 8-14, 2016
Start: 2016-10-10T08:00:00-07:00
End: 2016-10-14T18:00:00-07:00
Location: San Francisco Marriott Marquis

10–14 Oct 2016

San Francisco Marriott Marquis

America/Los_Angeles timezone

LHCb trigger streams optimization

11 Oct 2016, 14:15

15m

GG C3 (San Francisco Mariott Marquis)

GG C3

San Francisco Mariott Marquis

Oral Track 4: Data Handling Track 4: Data Handling

Nikita Kazeev (Yandex School of Data Analysis (RU))

The LHCb experiment stores around 10^11 collision events per year. A typical physics analysis deals with a final sample of up to 10^7 events. Event preselection algorithms (lines) are used for data reduction. They are run centrally and check whether an event is useful for a particular physical analysis. The lines are grouped into streams. An event is copied to all the streams its lines belong, possibly duplicating it. Due to the storage format allowing only sequential access, analysis jobs read every event and discard the ones they don’t need.

This scheme efficiency heavily depends on the streams composition. By putting similar lines together and balancing the streams sizes it’s possible to reduce the overhead. There are additional constraints that some lines are meant to be used together so they must go to one stream. The total number of streams is also limited by the file management infrastructure.

We developed a method for finding an optimal streams composition. It can be used for different cost functions, has the number of streams as an input parameter and accommodates the grouping constraint. It has been implemented using Theano [1] and the results are being incorporated into the streaming [2] of the LHCb Turbo [3] output with the projected analysis jobs IO time decrease of 20-50%.

[1] Theano: A Python framework for fast computation of mathematical expressions, The Theano Development Team
[2] Separate file streams https://gitlab.cern.ch/hschrein/Hlt2StreamStudy, Henry Schreiner et. al
[3] The LHCb Turbo Stream, Sean Benson et al., CHEP-2015

Primary Keyword (Mandatory)	Distributed data handling
Secondary Keyword (Optional)	Distributed workload management
Tertiary Keyword (Optional)	Data processing workflows and frameworks/pipelines

Alexander Panin (Yandex School of Data Analysis (RU)) Andrey Ustyuzhanin (Yandex School of Data Analysis (RU)) Mr Artem Redkin (Yandex Data Factory) Denis Derkach (Yandex School of Data Analysis (RU)) Mr Ilya Trofimov (Yandex Data Factory) Mika Anton Vesterinen (Ruprecht-Karls-Universitaet Heidelberg (DE)) Nikita Kazeev (Yandex School of Data Analysis (RU)) Radoslav Neychev (Yandex School of Data Analysis (RU))

Highlights-337.pdf

Oral-337.pdf

CHEP 2016 Conference, San Francisco, October 8-14, 2016

LHCb trigger streams optimization

GG C3

San Francisco Mariott Marquis

Speaker

Description

Authors

Presentation materials