STFC's National CDT Training Event 2019

Name: STFC's National CDT Training Event 2019
Start: 2018-11-19T10:00:00+00:00
End: 2018-11-21T15:00:00+00:00
Location: Edinburgh

19 Nov 2018, 10:00 → 21 Nov 2018, 15:00 Europe/London

Edinburgh

Description

Agenda to register for the parallel training sessions for STFC's National CDT Training Event. Please choose your favoured options by reading the details on the timetable:

https://indico.cern.ch/event/770601/timetable/#20181120.detailed

and contributions list:

https://indico.cern.ch/event/770601/contributions/

and then select your options via the registration link:

https://indico.cern.ch/event/770601/registrations/

Please note that due to space restrictions the maximum number of people per event is 20, so it is first come, first served.

Tuesday 20 November
- Tue 20 Nov
- Wed 21 Nov
- Tuesday Morning Training Session
  - 1
    
    Classical unsupervised learning with scikit-learn
    
    Unsupervised learning is a subsection of machine learning where the computer is trained with unlabelled data and must pick out important features in the data on its own. In terms of classical machine learning this boils down to two main approaches: dimensionality reduction and clustering. We will look at these two concepts and apply them practically to a dataset to test how good the computer’s intuition is.
    
    Speaker: John Armstrong
  - 2
    
    Getting started with Jupyter notebooks and machine learning
    
    Machine learning is one of the biggest buzzwords in the field of data science and it has many applications within both academia and industry. In this tutorial, Pivigo's community manager and a data scientist will take you through the basics of using Jupyter notebooks and how to get started with machine learning. Jupyter notebooks are a fantastic way to explore data and to conduct experiments on the data. He will be using the titanic dataset on Kaggle.
    
    Speaker: Deepak Mahtani
  - 3
    
    Git & Github
    
    I will quickly introduce the concept of version control, and distributed version control systems (DVCS), of which the most famous is git. We'll learn how to use git from the command line to create repositories which keep track of our projects (be they code, documents or otherwise). I'll end by describing how to store your git repositories on a remote server, the most famous of which is GitHub. If we have time, we'll get into more advanced topics such as branching.
    
    Speaker: Duncan Forgan
  - 4
    
    Planning your data science skills path
    
    By the end of your PhD you need to build the skills you'll need to compete in industry, academia, or elsewhere. In this session we will plot the path to deciding which skills you'll need, and how to develop them best.
    
    Speaker: Rita Tojeiro
- 12:30
  
  Lunch
- Tuesday Afternoon Training Session
  - 5
    
    Data Science & Machine Learning in e-Commerce by ASOS
    
    Big data in retail and e-commerce (a sort of why, what and how)
    - Common machine learning methods in retail and e-commerce (recommender systems, customer lifetime value prediction, automatic product understanding)
    - Practical aspects of deploying and using ML in retail and e-commerce (using large distributed computing systems such as Spark for example)
    - ‘Soft skills’ for data scientists: stakeholder management, understanding business value, expectation management.
    
    Speaker: Duncan Little (ASOS)
  - 6
    
    Data Science in the cloud by DeltaDNA
    
    Cloud computing has moved quickly from simply providing easy access to hardware to supplying easy to use services that eliminate the need for any sys admin knowledge. ML and data science has been one of the key drivers of this, with all the major cloud providers (i.e. AWS, GCP and Azure) offering a suite of tools to allow the construction of everything from data pipelines to ML model fitting to APIs to categorise data in real time and everything in between. In this session I will go over what is available in the cloud for data scientists and talk through some typical cloud tech stacks you will encounter in the world of big data. Finally I will talk about the pros and cons of different cloud technologies and how they apply to different real world applications.
    
    Speaker: Isaac Roseboom (DeltaDNA)
  - 7
    
    Git & Github
    
    I will quickly introduce the concept of version control, and distributed version control systems (DVCS), of which the most famous is git. We'll learn how to use git from the command line to create repositories which keep track of our projects (be they code, documents or otherwise). I'll end by describing how to store your git repositories on a remote server, the most famous of which is GitHub. If we have time, we'll get into more advanced topics such as branching.
    
    Speaker: Duncan Forgan
  - 8
    
    Parallel Programming
    
    Getting the most out of modern computer architecture means using parallel programming - making the computer run different pieces of calculations at the same time. In this tutorial we will discuss the various approaches towards this, such as thread- and process-level parallelism, and focus in detail on the MPI approach to the latter.
    
    Speaker: David Henty
  - 9
    
    Planning your data science skills
    
    By the end of your PhD you need to build the skills you'll need to compete in industry, academia, or elsewhere. In this session we will plot the path to deciding which skills you'll need, and how to develop them best.
    
    Speaker: Rita Tojeiro
Wednesday 21 November
- Tue 20 Nov
- Wed 21 Nov
- Wednesday Morning Training Session
  - 10
    
    Code profiling & Optimisation
    
    The faster our code runs, the more data we can process. In this tutorial you, will learn how to measure and improve CPU and memory performance in Linux.
    
    Speaker: Stewart Martin-Haugh (Science and Technology Facilities Council STFC (GB))
    
    Link to GitHub
  - 11
    
    Data Science & Machine Learning in e-Commerce by ASOS
    
    Big data in retail and e-commerce (a sort of why, what and how)
    - Common machine learning methods in retail and e-commerce (recommender systems, customer lifetime value prediction, automatic product understanding)
    - Practical aspects of deploying and using ML in retail and e-commerce (using large distributed computing systems such as Spark for example)
    - ‘Soft skills’ for data scientists: stakeholder management, understanding business value, expectation management.
    
    Speaker: Duncan Little (ASOS)
  - 12
    
    GPU Programming Plenary
    
    Plenary
    
    Edinburgh
    
    Graphics Processing Units (GPUs) are commonly available computing devices designed to enhancing computer game experiences. The underlying hardware can, however, be exploited to perform general calculations and has given rise to General-Purpose GPU (GPGPU) computing. In this tutorial, I will discuss 1) when it might be advantageous to develop code to run on a GPU, 2) the nuances of GPU hardware that affect the algorithm ported to the GPU, compared specifically with other forms of parallel programming, and 3) examples of GPU programming with CUDA, nVidia's extension to C/C++, highlighting ease and indicating pitfalls. Knowledge of C/C++ is advantageous, but not essential.
    
    Speaker: Eric Tittley
  - 13
    
    Neural Networks in TensorFlow
    
    This workshop will briefly introduce supervised deep-learning with neural networks, before walking through an example of constructing and training such networks using keras, a high-level Python interface to TensorFlow. We will train both fully-connected and convolutional neural nets to distinguish hand-written digits from the standard MNIST dataset, and validate the results. I will also discuss the importance of understanding and preparing your training data, and illustrate the benefits of data augmentation.
    
    Speaker: Steven Bamford (Unknown)
  - 14
    
    Parallel Programming
    
    Getting the most out of modern computer architecture means using parallel programming - making the computer run different pieces of calculations at the same time. In this tutorial we will discuss the various approaches towards this, such as thread- and process-level parallelism, and focus in detail on the MPI approach to the latter.
    
    Speaker: David Henty
- Wednesday Morning 2nd Training Session
  - 15
    
    Classical unsupervised learning with scikit-learn
    
    Unsupervised learning is a subsection of machine learning where the computer is trained with unlabelled data and must pick out important features in the data on its own. In terms of classical machine learning this boils down to two main approaches: dimensionality reduction and clustering. We will look at these two concepts and apply them practically to a dataset to test how good the computer’s intuition is.
    
    Speaker: John Armstrong
  - 16
    
    Code profiling & Optimisation
    
    The faster our code runs, the more data we can process. In this tutorial you, will learn how to measure and improve CPU and memory performance in Linux.
    
    Speaker: Stewart Martin-Haugh (Science and Technology Facilities Council STFC (GB))
    
    GitHub repository with material
  - 17
    
    GPU Programming
    
    Graphics Processing Units (GPUs) are commonly available computing devices designed to enhancing computer game experiences. The underlying hardware can, however, be exploited to perform general calculations and has given rise to General-Purpose GPU (GPGPU) computing. In this tutorial, I will discuss 1) when it might be advantageous to develop code to run on a GPU, 2) the nuances of GPU hardware that affect the algorithm ported to the GPU, compared specifically with other forms of parallel programming, and 3) examples of GPU programming with CUDA, nVidia's extension to C/C++, highlighting ease and indicating pitfalls. Knowledge of C/C++ is advantageous, but not essential.
    
    Speaker: Eric Tittley
  - 18
    
    Neural Networks in TensorFlow
    
    This workshop will briefly introduce supervised deep-learning with neural networks, before walking through an example of constructing and training such networks using keras, a high-level Python interface to TensorFlow. We will train both fully-connected and convolutional neural nets to distinguish hand-written digits from the standard MNIST dataset, and validate the results. I will also discuss the importance of understanding and preparing your training data, and illustrate the benefits of data augmentation.
    
    Speaker: Steven Bamford (Unknown)

Choose timezone

STFC's National CDT Training Event 2019

Edinburgh

Plenary

Edinburgh