IML Machine Learning Working Group - Parallelized/Distributed Machine Learning

40/S2-C01 - Salle Curie (CERN)

40/S2-C01 - Salle Curie


Show room on map
    • 3:00 PM 3:10 PM
      News and group updates 10m
      Speakers: Lorenzo Moneta (CERN), Michele Floris (CERN), Paul Seyfert (Universita & INFN, Milano-Bicocca (IT)), Dr Sergei Gleyzer (University of Florida (US)), Steven Randolph Schramm (Universite de Geneve (CH))
    • 3:10 PM 3:30 PM
      Internally-Parallelized Boosted Decision Trees 20m
      Speaker: Andrew Mathew Carnes (University of Florida (US))
    • 3:30 PM 3:50 PM
      Rapid development platforms for machine learning 20m
      Speaker: Dr Andrew Lowe (Hungarian Academy of Sciences (HU))
    • 3:50 PM 3:55 PM
      Distributed Deep Learning using Apache Spark and Keras (see materials) 5m

      Data parallelism is an inherently different methodology of optimizing parameters. The general idea is to reduce the training time by having n workers optimizing a central model by processing n different shards (partitions) of the dataset in parallel. In this setting we distribute n model replicas over n processing nodes, i.e., every node (or process) holds one model replica. Then, the workers train their local replica using the assigned data shard. However, it is possible to coordinate the workers in such a way that, together, they will optimize a single objective during training and as a result, reduce the wall clock training time. There are several approaches to achieve this, and these will be discussed in greater detail in the materials below.

      Speaker: Joeri Hermans (Maastricht University (NL))
    • 3:55 PM 4:25 PM
      Parallelization in Machine Learning with Multiple Processes 30m
      Speakers: Gerardo gutierrez (ITM), Omar Andres Zapata Mesa (University of Antioquia & Metropolitan Institute of Technology)
    • 4:25 PM 4:26 PM