Indico celebrates its 20th anniversary! Check our blog post for more information!

STFC's National CDT Training Event 2019

Europe/London
Edinburgh

Edinburgh

Description

Agenda to register for the parallel training sessions for STFC's National CDT Training Event. Please choose your favoured options by reading the details on the timetable:

https://indico.cern.ch/event/770601/timetable/#20181120.detailed

and contributions list:

https://indico.cern.ch/event/770601/contributions/

and then select your options via the registration link:

https://indico.cern.ch/event/770601/registrations/

Please note that due to space restrictions the maximum number of people per event is 20, so it is first come, first served.

  • Tuesday 20 November
    • Tuesday Morning Training Session
      • 1
        Classical unsupervised learning with scikit-learn

        Unsupervised learning is a subsection of machine learning where the computer is trained with unlabelled data and must pick out important features in the data on its own. In terms of classical machine learning this boils down to two main approaches: dimensionality reduction and clustering. We will look at these two concepts and apply them practically to a dataset to test how good the computer’s intuition is.

        Speaker: John Armstrong
      • 2
        Getting started with Jupyter notebooks and machine learning

        Machine learning is one of the biggest buzzwords in the field of data science and it has many applications within both academia and industry. In this tutorial, Pivigo's community manager and a data scientist will take you through the basics of using Jupyter notebooks and how to get started with machine learning. Jupyter notebooks are a fantastic way to explore data and to conduct experiments on the data. He will be using the titanic dataset on Kaggle.

        Speaker: Deepak Mahtani
      • 3
        Git & Github

        I will quickly introduce the concept of version control, and distributed version control systems (DVCS), of which the most famous is git. We'll learn how to use git from the command line to create repositories which keep track of our projects (be they code, documents or otherwise). I'll end by describing how to store your git repositories on a remote server, the most famous of which is GitHub. If we have time, we'll get into more advanced topics such as branching.

        Speaker: Duncan Forgan
      • 4
        Planning your data science skills path

        By the end of your PhD you need to build the skills you'll need to compete in industry, academia, or elsewhere. In this session we will plot the path to deciding which skills you'll need, and how to develop them best.

        Speaker: Rita Tojeiro
    • 12:30
      Lunch
    • Tuesday Afternoon Training Session
      • 5
        Data Science & Machine Learning in e-Commerce by ASOS

        Big data in retail and e-commerce (a sort of why, what and how)
        - Common machine learning methods in retail and e-commerce (recommender systems, customer lifetime value prediction, automatic product understanding)
        - Practical aspects of deploying and using ML in retail and e-commerce (using large distributed computing systems such as Spark for example)
        - ‘Soft skills’ for data scientists: stakeholder management, understanding business value, expectation management.

        Speaker: Duncan Little (ASOS)
      • 6
        Data Science in the cloud by DeltaDNA

        Cloud computing has moved quickly from simply providing easy access to hardware to supplying easy to use services that eliminate the need for any sys admin knowledge. ML and data science has been one of the key drivers of this, with all the major cloud providers (i.e. AWS, GCP and Azure) offering a suite of tools to allow the construction of everything from data pipelines to ML model fitting to APIs to categorise data in real time and everything in between. In this session I will go over what is available in the cloud for data scientists and talk through some typical cloud tech stacks you will encounter in the world of big data. Finally I will talk about the pros and cons of different cloud technologies and how they apply to different real world applications.

        Speaker: Isaac Roseboom (DeltaDNA)
      • 7
        Git & Github

        I will quickly introduce the concept of version control, and distributed version control systems (DVCS), of which the most famous is git. We'll learn how to use git from the command line to create repositories which keep track of our projects (be they code, documents or otherwise). I'll end by describing how to store your git repositories on a remote server, the most famous of which is GitHub. If we have time, we'll get into more advanced topics such as branching.

        Speaker: Duncan Forgan
      • 8
        Parallel Programming

        Getting the most out of modern computer architecture means using parallel programming - making the computer run different pieces of calculations at the same time. In this tutorial we will discuss the various approaches towards this, such as thread- and process-level parallelism, and focus in detail on the MPI approach to the latter.

        Speaker: David Henty
      • 9
        Planning your data science skills

        By the end of your PhD you need to build the skills you'll need to compete in industry, academia, or elsewhere. In this session we will plot the path to deciding which skills you'll need, and how to develop them best.

        Speaker: Rita Tojeiro
  • Wednesday 21 November
    • Wednesday Morning Training Session
      • 10
        Code profiling & Optimisation

        The faster our code runs, the more data we can process. In this tutorial you, will learn how to measure and improve CPU and memory performance in Linux.

        Speaker: Stewart Martin-Haugh (Science and Technology Facilities Council STFC (GB))
      • 11
        Data Science & Machine Learning in e-Commerce by ASOS

        Big data in retail and e-commerce (a sort of why, what and how)
        - Common machine learning methods in retail and e-commerce (recommender systems, customer lifetime value prediction, automatic product understanding)
        - Practical aspects of deploying and using ML in retail and e-commerce (using large distributed computing systems such as Spark for example)
        - ‘Soft skills’ for data scientists: stakeholder management, understanding business value, expectation management.

        Speaker: Duncan Little (ASOS)
      • 12
        GPU Programming Plenary

        Plenary

        Edinburgh

        Graphics Processing Units (GPUs) are commonly available computing devices designed to enhancing computer game experiences. The underlying hardware can, however, be exploited to perform general calculations and has given rise to General-Purpose GPU (GPGPU) computing. In this tutorial, I will discuss 1) when it might be advantageous to develop code to run on a GPU, 2) the nuances of GPU hardware that affect the algorithm ported to the GPU, compared specifically with other forms of parallel programming, and 3) examples of GPU programming with CUDA, nVidia's extension to C/C++, highlighting ease and indicating pitfalls. Knowledge of C/C++ is advantageous, but not essential.

        Speaker: Eric Tittley
      • 13
        Neural Networks in TensorFlow

        This workshop will briefly introduce supervised deep-learning with neural networks, before walking through an example of constructing and training such networks using keras, a high-level Python interface to TensorFlow. We will train both fully-connected and convolutional neural nets to distinguish hand-written digits from the standard MNIST dataset, and validate the results. I will also discuss the importance of understanding and preparing your training data, and illustrate the benefits of data augmentation.

        Speaker: Steven Bamford (Unknown)
      • 14
        Parallel Programming

        Getting the most out of modern computer architecture means using parallel programming - making the computer run different pieces of calculations at the same time. In this tutorial we will discuss the various approaches towards this, such as thread- and process-level parallelism, and focus in detail on the MPI approach to the latter.

        Speaker: David Henty
    • Wednesday Morning 2nd Training Session
      • 15
        Classical unsupervised learning with scikit-learn

        Unsupervised learning is a subsection of machine learning where the computer is trained with unlabelled data and must pick out important features in the data on its own. In terms of classical machine learning this boils down to two main approaches: dimensionality reduction and clustering. We will look at these two concepts and apply them practically to a dataset to test how good the computer’s intuition is.

        Speaker: John Armstrong
      • 16
        Code profiling & Optimisation

        The faster our code runs, the more data we can process. In this tutorial you, will learn how to measure and improve CPU and memory performance in Linux.

        Speaker: Stewart Martin-Haugh (Science and Technology Facilities Council STFC (GB))
      • 17
        GPU Programming

        Graphics Processing Units (GPUs) are commonly available computing devices designed to enhancing computer game experiences. The underlying hardware can, however, be exploited to perform general calculations and has given rise to General-Purpose GPU (GPGPU) computing. In this tutorial, I will discuss 1) when it might be advantageous to develop code to run on a GPU, 2) the nuances of GPU hardware that affect the algorithm ported to the GPU, compared specifically with other forms of parallel programming, and 3) examples of GPU programming with CUDA, nVidia's extension to C/C++, highlighting ease and indicating pitfalls. Knowledge of C/C++ is advantageous, but not essential.

        Speaker: Eric Tittley
      • 18
        Neural Networks in TensorFlow

        This workshop will briefly introduce supervised deep-learning with neural networks, before walking through an example of constructing and training such networks using keras, a high-level Python interface to TensorFlow. We will train both fully-connected and convolutional neural nets to distinguish hand-written digits from the standard MNIST dataset, and validate the results. I will also discuss the importance of understanding and preparing your training data, and illustrate the benefits of data augmentation.

        Speaker: Steven Bamford (Unknown)