Speaker
Mr
Matthew Leigh
(University of Geneva)
Description
The Bert pretraining paradigm has proven to be highly effective in many domains including natural language processing, image processing and biology. To apply the Bert paradigm the data needs to be described as a set of tokens, and each token needs to be labelled. To date the Bert paradigm has not been explored in the context of HEP. The samples that form the data used in HEP can be described as a set of particles (tokens) where each particle is represented as a continuous vector. We explore different approaches for discretising/labelling particles such that the Bert pretraining can be performed and demonstrate the utility of the resulting pretrained models on common downstream HEP tasks.
Authors
Johnny Raine
(Universite de Geneve (CH))
Lukas Alexander Heinrich
(Technische Universitat Munchen (DE))
Prof.
Margarit Osadchy
(University of Haifa)
Mr
Matthew Leigh
(University of Geneva)
Michael Kagan
(SLAC National Accelerator Laboratory (US))
Samuel Byrne Klein
(Universite de Geneve (CH))
Tobias Golling
(Universite de Geneve (CH))