Indico has been updated to v3.3. See our blog post for details on this release. (OTG0146394)

Mini-Workshop: Differentiable Programming for High-Performance, Data-Intensive Computations


Derivatives play a critical role in science. Techniques able to automatically and efficiently differentiate can dramatically reduce runtime for applications from machine learning to Monte Carlo. However, implementation of efficient automatic differentiation (AD) in a high performance programming language is not easy. Challenges include language rules that are too numerous, too complex and that evolve too often for a custom parser to handle. Implementations rely on custom parsers or other language facilities. 

This mini-workshop aims to discuss new approaches to flexible, scalable and efficient techniques for AD and their application to data-intensive science domains.

Organized by Marco Foco (NVIDIA), William Moses (MIT), Vassil Vassilev (Princeton), David Lange (Princeton)

    • 5:00 PM 5:15 PM
      Introduction 15m
      Speakers: David Lange (Princeton University (US)), Vasil Georgiev Vasilev (Princeton University (US))
    • 5:20 PM 5:40 PM
      Post-Optimization Automatic Differentiation by Synthesizing LLVM 20m

      Applying differentiable programming techniques and machine learning algorithms to foreign programs requires developers to either rewrite their code in a machine learning framework, or otherwise provide derivatives of the foreign code. This talk presents Enzyme, a high-performance automatic differentiation (AD) compiler plugin for the LLVM compiler framework capable of synthesizing gradients of statically analyzable programs expressed in the LLVM intermediate representation (IR). Enzyme can synthesize gradients for programs written in any language whose compiler targets LLVM IR including C, C++, Fortran, Julia, Rust, Swift, MLIR, etc., thereby providing native AD capabilities in these languages. Unlike traditional source-to-source and operator-overloading tools, Enzyme performs AD on optimized IR. On a machine-learning focused benchmark suite including Microsoft's ADBench, AD on optimized IR achieves a geometric mean speedup of 4.5x over AD on IR before optimization allowing Enzyme to achieve state-of-the-art performance. Packaging Enzyme for PyTorch and TensorFlow provides convenient access to gradients of foreign code with state-of-the art performance, enabling foreign code to be directly incorporated into existing machine learning workflows.

      Speaker: Mr William Moses (MIT)
    • 5:50 PM 6:10 PM
      Domain-Specific Automatic Differentiation for GLSL with LLVM 20m

      When writing shaders, there often arises a need for differentiation (in particular when dealing with bump maps or normal maps). However, almost all such cases concern a single specific global variable - the screen-space coordinates of the point that is currently being shaded. With this restriction, there is no need for full-blown automatic differentiation.

      We have implemented a LLVM module transformation which adds the needed code to compute derivatives where needed, while only supporting the aforementioned case, which has led to a concise implementation.

      This talk will present an overview of the domain restrictions and the transformation implementation.

      Speakers: Mr Angel Angelov (ChaosGroup), Mr Ivan Komitov (ChaosGroup)
    • 6:20 PM 6:40 PM
      Clad -- Automatic Differentiation for C++ Using Clang 20m

      Implementation of efficient automatic differentiation (AD) in a high performance programming language is not easy. Challenges include language rules that are too numerous, too complex and that evolve too often for a custom parser to handle. Implementations rely on custom parsers or other language facilities. For example, AD in C++ can be based on operator overloading which requires the introduction of a new floating point type and are unfit to legacy code, or to code which is not written with a particular tool in mind.

      We have taken another approach, a compiler extension using the Clang parser, a component of the compiler toolchain, able to algorithmically differentiate complex language constructs. The extension, Clad, has full access to Clang compiler internals. This allows it to influence the generation of LLVM code and to provide several useful tools to the user, including retargeting to accelerator devices. Clad can generate derivatives, gradients, hessians and jacobians for C++ codes and is actively working to support CUDA.

      The talk will showcase the AD advancements in Clad; use of Clad within the C++ interpreter cling to generate derivatives on the fly; the prototype usage of cling's CUDA incremental compilation engine to execute them; and results from science use cases.

      Speaker: Vasil Georgiev Vasilev (Princeton University (US))
    • 6:50 PM 7:10 PM
      Use of auto-differentiation within the ACTS toolkit 20m

      ACTS is a common track reconstruction toolkit that aims to preserve the tack reconstruction software from the LHC era and at the same time prepares a R&D testbed for further algorithm and technology research. At the moment, auto-differentiation is used in ACTS for the validation of several algorithms involving the computation of complicated jacobians. For the jacobian transport during the numerical integration this is already merged into ACTS. As a next step, the validation of alignment derivatives with auto-differentiation techniques is planned. The implementation is based on the C++17 library autodiff ( and focuses on providing a generic and easy-to-use interface rather on achieving an optimal performance.

      Speaker: Benjamin Huth (Universität Regensburg)
    • 7:30 PM 7:50 PM
      Wrap-up / next steps 20m
      Speakers: Mr Marco Foco (NVIDIA), Mr Moses William (MIT), Vasil Georgiev Vasilev (Princeton University (US))