Automatic Differentiation of Compiled Programs

  • Algorithmic differentiation (AD) is a set of techniques to "differentiate computer programs": Given a primal program that evaluates some mathematical function, AD allows to evaluate the derivative function. Unlike numerical difference quotients, algorithmic derivatives are accurate up to machine precision and, in the "reverse mode", can be formed with respect to a large number of input variables in one stroke. When combined with gradient-based optimization algorithms, AD is therefore a powerful tool to optimize engineering designs or learn weights of neural networks, besides many other applications. Up to date, most implementations of AD access the primal program via its source code, which they require to be available and written in a limited set of programming languages, typically not supporting the full language standards without manual user intervention. In this dissertation, we present the novel AD tool Derivgrind that interacts with the machine code of the compiled primal program. Implemented in the Valgrind framework for dynamic binary instrumentation, Derivgrind augments portions of machine code with AD logic just in time before they potentially execute on the processor. Specifically, Derivgrind's forward-mode AD logic keeps track of a floating-point "dot value" for every floating-point number appearing during the execution of the primal program. These dot values store the derivatives with respect to a single input variable, and are computed alongside, using elementary differentiation rules. Derivgrind also implements reverse-mode AD, by inserting AD logic that tags all floating-point numbers with identifiers, and uses them to record the real-arithmetic evaluation graph on a datastructure called the "tape". Besides our extensive suite of regression tests and a simple numerical solver for Burgers' partial differential equation, we have tested Derivgrind on three larger software projects: the Python interpreter CPython, the spreadsheet software LibreOffice Calc, and the medical imaging application GATE based on the Monte-Carlo particle physics simulator Geant4. We are not aware of any successful previous attemps to compute algorithmic derivatives of these programs. With Derivgrind, only a few lines of code needed to be changed to accomplish that. As a price for its versatility, Derivgrind slows down the primal program by a larger factor than many source-code-based tools. In addition, the issue of "bit-tricks" is more pronounced on the machine code level: If the primal program performs real-arithmetic operations in too obscure ways, Derivgrind cannot recognize their arithmetic meaning and may compute wrong derivatives. We have included a detailed discussion of various bit-trick mechanisms; in practical terms, nearly all of them are academic or originate from highly optimized math libraries. As long as differentiating those are avoided, Derivgrind is applicable to an unprecedentedly wide range of cross-language or partially closed-source software with little manual efforts.
Metadaten
Author:Max AehleORCiD
URN:urn:nbn:de:hbz:386-kluedo-84342
DOI:https://doi.org/10.26204/KLUEDO/8434
Advisor:Nicolas GaugerORCiD
Document Type:Doctoral Thesis
Cumulative document:No
Language of publication:English
Date of Publication (online):2024/10/25
Year of first Publication:2024
Publishing Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Granting Institution:Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Acceptance Date of the Thesis:2024/10/08
Date of the Publication (Server):2024/10/28
Page Number:225
Faculties / Organisational entities:Kaiserslautern - Fachbereich Informatik
DDC-Cassification:0 Allgemeines, Informatik, Informationswissenschaft / 004 Informatik
Licence (German):Creative Commons 4.0 - Namensnennung, nicht kommerziell, keine Bearbeitung (CC BY-NC-ND 4.0)