# Differentiation of the Cholesky decomposition

@article{Murray2016DifferentiationOT, title={Differentiation of the Cholesky decomposition}, author={Iain Murray}, journal={ArXiv}, year={2016}, volume={abs/1602.07527} }

We review strategies for differentiating matrix-based computations, and derive symbolic and algorithmic update rules for differentiating expressions containing the Cholesky decomposition. We recommend new `blocked' algorithms, based on differentiating the Cholesky algorithm DPOTRF in the LAPACK library, which uses `Level 3' matrix-matrix operations from BLAS, and so is cache-friendly and easy to parallelize. For large matrices, the resulting algorithms are the fastest way to compute Cholesky…

#### Topics from this paper

#### 32 Citations

Auto-Differentiating Linear Algebra

- Computer Science, MathematicsArXiv
- 2017

This work details how a number of matrix decompositions (Cholesky, LQ, symmetric eigen) can be implemented as differentiable operators in MXNet, running on CPU and GPU in single and double precision.

Stochastic Gradient Descent for Spectral Embedding with Implicit Orthogonality Constraint

- Computer Science, MathematicsICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019

The contribution consists of reformulating spectral embedding so that it can be solved via stochastic optimization, and replaces the orthogonality constraint with an orthogonalization matrix injected directly into the criterion.

GPU-based Parallel Computation Support for Stan

- Computer Science, MathematicsArXiv
- 2019

This paper details an extensible OpenCL framework that allows Stan to utilize heterogeneous compute devices. It includes GPU-optimized routines for the Cholesky decomposition, its derivative, other…

What is the gradient of a scalar function of a symmetric matrix ?

- Mathematics, Computer ScienceArXiv
- 2019

It is demonstrated that the relation between the gradient of a real-valued function of a symmetric matrix and the definition of a \frechet derivative is incorrect, and it is proved that G_s = \mathrm{sym}(G)$.

Banded Matrix Operators for Gaussian Markov Models in the Automatic Differentiation Era

- Computer Science, MathematicsAISTATS
- 2019

The aim of the paper is to make modern inference methods available for Gaussian models with banded precision available by equipping an automatic differentiation framework, such as TensorFlow or PyTorch, with some linear algebra operators dedicated to banded matrices.

Total stochastic gradient algorithms and applications in reinforcement learning

- Computer Science, MathematicsNeurIPS
- 2018

This work shows how the total derivative rule leads to an intuitive visual framework for creating gradient estimator on graphical models, and derives new gradient estimators based on density estimation, as well as a likelihood ratio gradient.

GaussianProcesses.jl:A Nonparametric Bayes package for the Julia Language

- Computer Science, Mathematics
- 2018

A tutorial of the GaussianProcesses.jl package, a fast, flexible and user-friendly Gaussian processes package that has been developed for the Julia language, that makes efficient use of existing Julia packages to provide users with a range of optimization and plotting tools.

On the Computation of Complex-valued Gradients with Application to Statistically Optimum Beamforming

- Computer ScienceArXiv
- 2017

The computation of gradients by algorithmic differentiation for statistically optimum beamforming operations via the complex-valued chain rule and the derivative of the eigenvalue problem withcomplex-valued eigenvectors is one of the key results of this report.

Eigendecomposition-Free Training of Deep Networks for Linear Least-Square Problems

- Computer Science, MedicineIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2021

An eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network, which has better convergence properties and yields state-of-the-art results.

Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses

- Computer ScienceECCV
- 2018

This paper introduces an eigendecomposition-free approach to training a deep network whose loss depends on the eigenvector corresponding to a zero eigenvalue of a matrix predicted by the network.

#### References

SHOWING 1-10 OF 31 REFERENCES

Structured higher-order algorithmic differentiation in the forward and reverse mode with application in optimum experimental design

- Computer Science
- 2012

It is demonstrated that the novel algorithms can be used to compute the gradient of the optimum experimental design (OED) objective function in the reverse mode of AD, and how the necessary higherorder derivatives can be computed using Taylor polynomial arithmetic is discussed.

Adjoint Algorithmic Differentiation and the Derivative of the Cholesky Decomposition

- Mathematics
- 2015

Adjoint Algorithmic Differentiation is an efficient method to calculate the sensitivities of financial instruments to the input parameters. It does require an implementation of the derivative of…

Differentiation of the Cholesky Algorithm

- Mathematics
- 1995

Abstract One way to estimate variance components is by restricted maximum likelihood. The log-likelihood function is fully defined by the Cholesky factor of a matrix that is usually large and sparse.…

The Elimination Matrix: Some Lemmas and Applications

- Computer Science, MathematicsSIAM J. Matrix Anal. Appl.
- 1980

Two transformation matrices are introduced, L and D, which contain zero and unit elements only, and are demonstrated in three areas of mathematical statistics and matrix algebra: maximum likelihood estimation of the multivariate normal distribution, the evaluation of Jacobians of transformations with symmetric or lower triangular matrix arguments, and the solution of matrix equations.

Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms

- Computer ScienceTOMS
- 2013

Four routines called DPOTF3i, i = a,b,c,d, are presented. DPOTF3i are a novel type of level-3 BLAS for use by BPF (Blocked Packed Format) Cholesky factorization and LAPACK routine DPOTRF. Performance…

An extended collection of matrix derivative results for forward and reverse mode algorithmic dieren tiation

- Mathematics
- 2008

This paper collects together a number of matrix derivative results which are very useful in forward and reverse mode algorithmic differentiation (AD). It highlights in particular the remarkable…

Automatic differentiation in machine learning: a survey

- Mathematics, Computer ScienceJ. Mach. Learn. Res.
- 2017

By precisely defining the main differentiation techniques and their interrelationships, this work aims to bring clarity to the usage of the terms “autodiff’, “automatic differentiation”, and “symbolic differentiation" as these are encountered more and more in machine learning settings.

Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs

- Computer ScienceTOMS
- 1988

This paper describes a model implementation and test software for the Level 2 Basic Linear Algebra Subprograms (Level 2 BLAS). Level 2 BLAS are targeted at matrix-vector operations with the aim of…

LAPACK Users' Guide

- Computer Science
- 1987

The third edition of LAPACK provided a guide to troubleshooting and installation of Routines, as well as providing examples of how to convert from LINPACK or EISPACK to BLAS.

Guide to NumPy

- Computer Science
- 2015

This is the second edition of Travis Oliphant's A Guide to NumPy, designed to be a reference that can be used by practitioners who are familiar with Python but want to learn more about NumPy and related tools.