Speaker
Description
In this talk I will present my ongoing research in differentiable programming, specifically its application to scenarios with discontinuities induced by discrete structural constructs like conditional statements. If the parameters with respect to which we wish to differentiate appear in the predicate, then obtaining derivatives with respect to those parameters by propagating tangents or adjoints is hopeless. From the mathematical perspective this can be explained by showing that the conditional is equivalent to the heaviside function in the parameters of interest, from AD perspective this means that the computation carried out to decide which branch is to be taken never enters the computational graph rendering AD algorithms ignorant to the decision making.
This is problematic since often times the underlying computed object is differentiable (or weakly differentiable), and the inability to capture its derivatives is a pure artifact of the implementation. If we want to achieve a truly end-to-end differentiable systems this issue needs to be dealt with. A simple technique of smooth relaxation mentioned by [1, 2, 3] can be used to address this problem. However, we show that blindly smoothing does not result in meaningful gradients. Using the bisection method for scalar root finding as a case study we derive its relaxed formulation and perform an analysis of the resultant algorithm which helps reveal the criterion that needs to be fulfilled to achieve consistency with implicit differentiation and bring meaning to the calculated derivative.
We speculate that for algorithms that are harder to study the consistency criterion can be learned or calibrated.
Keywords:
- Differentiable programming and differentiable systems
- Implicit differentiation and implicit function theorem
- Algorithmic differentiation
- Analysis of algorithms
References
- [1] M. Blondel and V. Roulet - The Elements of Differentiable Programming
- [2] S. Christodoulou and U. Naumann - Differentiable Programming: Efficient Smoothing of Control-Flow-Induced Discontinuities
- [3] A. Graves - Neural Turing Machines