Are you a Python guru or would you like to learn?
The Developers@CERN Forum is an event by developers, for developers, aimed at promoting knowledge and experience sharing.
This edition will take place at the IT Amphitheatre, on the 30th and 31st of May afternoons. It will consist of a series of short presentations and workshops. The topic for this conference will be Python at CERN: language, frameworks and tools.
Have you got an idea for a presentation or workshop? Then, tell us about it (deadline on 9th of May).
Registration will open in early May. Please subscribe to email@example.com mailing list for further information.
This event will be made by developers for developers. We are counting on your presence, but also on your contributions!
To learn more about the initiative, read the CERN Bulletin article.
You can get in touch us at firstname.lastname@example.org.
Internationalization and Localization are increasingly important in an interconnected world. Regardless of that, developers tend to treat them as secondary issues, very often choosing to address them properly when it's already too late. The fact that most programming language standard libraries choose to ignore the matter doesn't help either.
In this talk we will present some useful Python libraries and tools that can help you internationalize and localize your code with minimal effort. We will also describe some common pitfalls and problems.
Developing in python is fast. Computation, however, can often be another story. Or at least that is how it may seem. When working with arrays and numerical datasets one can subvert many of python’s computational limitations by utilizing numpy. Numpy is python’s standard matrix computation library. Many python users only use numpy to store and generate arrays, failing to utilize one of python’s most powerful computational tools. By leveraging numpy’s ufuncs, aggregation, broadcasting and slicing/masking/indexing functionality one can cut back on slow python loops and increase the speed of their programs by as much as 100x. This talk aims at teaching attendees how to use these tools through toy examples.
In this practical talk, we'll discuss tips and tricks useful when developing
Python applications in Emacs. The topics include: efficient navigation and
jumping around, code completion, code skeletons, working with virtual
environments, interactive development with REPL, test-driven development,
integration with source control management tools. We'll demonstrate given
techniques on small example projects.
SQLAlchemy is the most popular ORM and SQL abstraction layer for Python and used by multiple big projects at CERN such as Indico or Invenio. In my talk I'm going to give a short introduction on how to use it.
With ROOT it's possible to use any C++ library from Python without writing any bindings nor dictionaries: loading the library and injecting the relevant headers in the ROOT C++ interpreter is enough to guarantee interactive usage from within Python. Just in time (JIT) compilation of C++ code and immediate utilisation of C++ entities from within Python is also supported.
Thanks to the ROOT type system and C++ interpreter and JIT compiler, complete Python/C++ interoperability is achieved. In this contribution we explain how this mechanism is general enough to make any library written in C or C++ usable from within Python and how concepts such as template metaprogramming are mapped in Python. We review the basics of the JIT compilation capabilities provided by the Clang based ROOT interpreter, Cling, and the way in which some of the information of the Abstract Syntax Tree (AST) built by Clang is stored by the ROOT type system. The way in which ROOT manages the automatic loading of libraries and parsing of necessary headers is also described.
We illustrate from the programming model point of view the simplicity of the invocation of the C++ entities from within the Python world through concrete examples.
Live demos are provided whenever possible to grant the audience an enhanced experience.
The COOL software is used by the ATLAS and LHCb experiments to handle the time variation and versioning of their conditions data, using a variety of different relational database technologies. While the COOL core libraries are written in C++ and are integrated in the experiment C++ frameworks, a package offering Python bindings of the COOL C++ APIs, PyCool, is also provided and has been an essential component of the ATLAS conditions data management toolkit for over 10 years. Almost since the beginning, the implementation of PyCool has been based on ROOT to generate Python bindings for C++, initially using Reflex and PyROOT in ROOT5 and more recently using clang and cppyy in ROOT6. This presentation will describe the PyCool experience with using ROOT to generate Python bindings for C++, throughout the many evolutions of the underlying technology.
For a long time C++ was virtually the only language of HEP data analysis. This has certainly changed in the past few years: Python became a cornerstone of everyday work of Physicists and this is due to a large extent thanks to ROOT.
In this contribution we discuss the technical aspects enabling this innovation.
ROOT is a modular scientific software framework which provides all the functionalities needed to deal with big data processing, statistical analysis, visualisation and storage. One of the reasons for its success is its Python interface, PyROOT.
The programming model offered by PyROOT and the way it complements the ROOT C++ one in everyday analysis is characterised. Concrete examples are given of its usage in the software stacks of LHC experiments.
The integration of PyROOT with Jupyter notebooks is described as well as glimpses of its potential for interactive data mining of experiments' data and non scientific data such as logs or machine instrumentation output.
Elements of R&D activities are also outlined such as the integration of ROOT with Apache Spark with PyROOT and PySpark.
The CERN Service for Web based ANalysis, SWAN, is introduced describing its potential for delivering ROOT, Python and other analysis ecosystems.
Live demos are provided whenever possible to grant the audience an enhanced experience.
As mentioned in the 2nd developers meeting, I would like to open the debate with a special presentation on another language - Lua, and a tremendous technology - LuaJit. Lua is much less known at CERN, but it is very simple, much smaller than Python and its JIT is extremely performant. The language is a dynamic scripting language easy to learn and easy to embedded in applications. I will show how we use it in HPC for accelerator beam physics as a replacement for C, C++, Fortran and Python, with some benchmarks versus Python, PyPy4 and C/C++.
Object tagging, e.g. jet flavor tagging is seen as a classification problem from a Machine Learning point of view. Deep neural networks with multidimensional output provide one way of approaching this problem.
Besides the part that implements the resulting deep neural net in the ATLAS C++ software framework, a Python framework has been developed to connect HEP data to standard Data Science Python based libraries for Machine Learning. It makes use of HDF5, JSON and Pickle as intermediate data storage format, pandas and numpy for data handling and calculations, Keras for neural net construction and training as well as testing and matplotlib for plotting. It can be seen as an example of taking advantage of outside-HEP software developments without relying on the HEP standard ROOT.
Data analysis is integral to what we do at CERN. Data visualization is at the foundation of this workflow and is also an important part of the python stack. Python's plotting ecosystem offers numerous open source solutions. These solutions can offer ease of use, detailed configuration, interactivity and web readiness. This talk will cover three of the most robust and supported packages, matplotlib, bokeh, and plotly. It aims to provide an overview of these packages. In addition, give suggestions to where these tools might fit in an analysis workflow.
Keras is a modular, powerful and intuitive open-source Deep Learning library built on Theano and TensorFlow. Thanks to its minimalist, user-friendly interface, it has become one of the most popular packages to build, train and test neural networks, for both beginners and experts.
In this tutorial, we will start with an introduction of the basics of neural networks and will work through fully functioning examples, with an eye towards deployment strategies within the context of CERN.
Fundamental steps and parameters in Deep Learning will be presented both from a conceptual and a practical standpoint, by looking at the way Keras implements them and exposes them to the users.
Please visit the Keras website prior to the workshop for installation instructions and feel free to reach out for any issue.
The talk will show the current implementation of the software tool developed by Silab (Bonn) and Oxford University to analyze test beam data with Mimosa telescope. Data collected from the telescope are merged with hits recorded on pixel detectors with a FE-I4 chips, the official read-out chip of the Atlas Pixel Detector. The software tool used to collect data, pyBAR, is developed with Python as well. The test-beam analysis tool parses the data-sets, recreates the tracks, aligns the telescope planes and allows to investigate the detectors spatial properties with high resolution. This has just allowed to study the properties of brand new devices that stand as possible candidate to replace the current pixel detector in Atlas.
Experience has taught us that bugs are impossible to avoid when programming. Specially on continuous delivery processes where there are new versions that refactor or incorporate new modules to the project.
Although, there are different tools which help us to ensure code quality by enabling developers to catch bugs while still in the development stage.
In this talk, I will talk about Test-driven development(TDD) and Behaviour-Driven development (BDD) methodologies focused on web development. Also, I will present an overview of unit testing tools as Selenium or Behave, which help us to produce working software, with fewer bugs, quickly and consistently.
Software applications developed in C/C++ and other compiled languages may be split into several executables and shared libraries, so that different components or applications can reuse the same code base. However, reusing shared libraries from applications written in different languages can be quite challenging, especially if we must keep the original code unchanged.
A specific case within the RP group was to access the core routines of a software directly from Python, enabling for interactive manipulation of the data and building programs on top of that, without re-implementing or even modifying the existing functions.
For that purpose an IPC specification was investigated and a software tool developed in Python which creates Python bindings by directly by inspecting the .h header file. By specifying accessor functions, accordingly named, in the library side, the generated wrapper offers an object-oriented interface to the DLL, handling type conversion and storing objects as necessary.
The current solution differs from existing binding generators as it doesn’t requires changes in the source routines (only the creation of accessor functions) and the generated wrapper is very clean Object-Oriented Python code, very easily customizable.
Did you know that Python preallocates integers from -5 to 257 ? Reusing them 1000 times, instead of allocating memory for a bigger integer, can save you a couple of milliseconds of code’s execution time. If you want to learn more about this kind of optimizations then, … well, probably this presentation is not for you :) Instead of going into such small details, I will talk about more "sane" ideas for writing faster code. After a very brief overview of how to optimize Python code (rule 1: don’t do this; rule 2: don’t do this yet; rule 3: ok, but what if I really want to do this ?), I will show simple and fast ways of measuring the execution time and finally, discuss examples of how some code structures could be improved. You will see:
I will NOT go into details of "serious" optimization, like using different Python implementation or rewriting critical code in C, etc.