- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
Indico celebrates its 20th anniversary! Check our blog post for more information!
Agenda to register for the parallel training sessions for STFC's National CDT Training Event. Please choose your favoured options by reading the details on the timetable:
https://indico.cern.ch/event/770601/timetable/#20181120.detailed
and contributions list:
https://indico.cern.ch/event/770601/contributions/
and then select your options via the registration link:
https://indico.cern.ch/event/770601/registrations/
Please note that due to space restrictions the maximum number of people per event is 20, so it is first come, first served.
Unsupervised learning is a subsection of machine learning where the computer is trained with unlabelled data and must pick out important features in the data on its own. In terms of classical machine learning this boils down to two main approaches: dimensionality reduction and clustering. We will look at these two concepts and apply them practically to a dataset to test how good the computer’s intuition is.
Machine learning is one of the biggest buzzwords in the field of data science and it has many applications within both academia and industry. In this tutorial, Pivigo's community manager and a data scientist will take you through the basics of using Jupyter notebooks and how to get started with machine learning. Jupyter notebooks are a fantastic way to explore data and to conduct experiments on the data. He will be using the titanic dataset on Kaggle.
I will quickly introduce the concept of version control, and distributed version control systems (DVCS), of which the most famous is git. We'll learn how to use git from the command line to create repositories which keep track of our projects (be they code, documents or otherwise). I'll end by describing how to store your git repositories on a remote server, the most famous of which is GitHub. If we have time, we'll get into more advanced topics such as branching.
By the end of your PhD you need to build the skills you'll need to compete in industry, academia, or elsewhere. In this session we will plot the path to deciding which skills you'll need, and how to develop them best.
Big data in retail and e-commerce (a sort of why, what and how)
- Common machine learning methods in retail and e-commerce (recommender systems, customer lifetime value prediction, automatic product understanding)
- Practical aspects of deploying and using ML in retail and e-commerce (using large distributed computing systems such as Spark for example)
- ‘Soft skills’ for data scientists: stakeholder management, understanding business value, expectation management.
Cloud computing has moved quickly from simply providing easy access to hardware to supplying easy to use services that eliminate the need for any sys admin knowledge. ML and data science has been one of the key drivers of this, with all the major cloud providers (i.e. AWS, GCP and Azure) offering a suite of tools to allow the construction of everything from data pipelines to ML model fitting to APIs to categorise data in real time and everything in between. In this session I will go over what is available in the cloud for data scientists and talk through some typical cloud tech stacks you will encounter in the world of big data. Finally I will talk about the pros and cons of different cloud technologies and how they apply to different real world applications.
I will quickly introduce the concept of version control, and distributed version control systems (DVCS), of which the most famous is git. We'll learn how to use git from the command line to create repositories which keep track of our projects (be they code, documents or otherwise). I'll end by describing how to store your git repositories on a remote server, the most famous of which is GitHub. If we have time, we'll get into more advanced topics such as branching.
Getting the most out of modern computer architecture means using parallel programming - making the computer run different pieces of calculations at the same time. In this tutorial we will discuss the various approaches towards this, such as thread- and process-level parallelism, and focus in detail on the MPI approach to the latter.
By the end of your PhD you need to build the skills you'll need to compete in industry, academia, or elsewhere. In this session we will plot the path to deciding which skills you'll need, and how to develop them best.
The faster our code runs, the more data we can process. In this tutorial you, will learn how to measure and improve CPU and memory performance in Linux.
Big data in retail and e-commerce (a sort of why, what and how)
- Common machine learning methods in retail and e-commerce (recommender systems, customer lifetime value prediction, automatic product understanding)
- Practical aspects of deploying and using ML in retail and e-commerce (using large distributed computing systems such as Spark for example)
- ‘Soft skills’ for data scientists: stakeholder management, understanding business value, expectation management.
Graphics Processing Units (GPUs) are commonly available computing devices designed to enhancing computer game experiences. The underlying hardware can, however, be exploited to perform general calculations and has given rise to General-Purpose GPU (GPGPU) computing. In this tutorial, I will discuss 1) when it might be advantageous to develop code to run on a GPU, 2) the nuances of GPU hardware that affect the algorithm ported to the GPU, compared specifically with other forms of parallel programming, and 3) examples of GPU programming with CUDA, nVidia's extension to C/C++, highlighting ease and indicating pitfalls. Knowledge of C/C++ is advantageous, but not essential.
This workshop will briefly introduce supervised deep-learning with neural networks, before walking through an example of constructing and training such networks using keras, a high-level Python interface to TensorFlow. We will train both fully-connected and convolutional neural nets to distinguish hand-written digits from the standard MNIST dataset, and validate the results. I will also discuss the importance of understanding and preparing your training data, and illustrate the benefits of data augmentation.
Getting the most out of modern computer architecture means using parallel programming - making the computer run different pieces of calculations at the same time. In this tutorial we will discuss the various approaches towards this, such as thread- and process-level parallelism, and focus in detail on the MPI approach to the latter.
Unsupervised learning is a subsection of machine learning where the computer is trained with unlabelled data and must pick out important features in the data on its own. In terms of classical machine learning this boils down to two main approaches: dimensionality reduction and clustering. We will look at these two concepts and apply them practically to a dataset to test how good the computer’s intuition is.
The faster our code runs, the more data we can process. In this tutorial you, will learn how to measure and improve CPU and memory performance in Linux.
Graphics Processing Units (GPUs) are commonly available computing devices designed to enhancing computer game experiences. The underlying hardware can, however, be exploited to perform general calculations and has given rise to General-Purpose GPU (GPGPU) computing. In this tutorial, I will discuss 1) when it might be advantageous to develop code to run on a GPU, 2) the nuances of GPU hardware that affect the algorithm ported to the GPU, compared specifically with other forms of parallel programming, and 3) examples of GPU programming with CUDA, nVidia's extension to C/C++, highlighting ease and indicating pitfalls. Knowledge of C/C++ is advantageous, but not essential.
This workshop will briefly introduce supervised deep-learning with neural networks, before walking through an example of constructing and training such networks using keras, a high-level Python interface to TensorFlow. We will train both fully-connected and convolutional neural nets to distinguish hand-written digits from the standard MNIST dataset, and validate the results. I will also discuss the importance of understanding and preparing your training data, and illustrate the benefits of data augmentation.