Implicit differentiation of root solvers

Implicit function theorem

Given an implicit variable $x$ defined through the root $F(x(\theta),\theta)=0$, one can apply the implicit function theorem (IFT) to obtain the gradients of $x$ wrt $\theta$

\[\frac{d F}{d \theta} = \frac{\partial F}{\partial x} \frac{d x}{d \theta} + \frac{\partial F}{\partial \theta} = 0 \implies \frac{d x}{d \theta} = - \left(\frac{\partial F}{\partial x}\right)^{-1} \frac{\partial F}{\partial \theta}\]

To avoid the unrolling of iterations, computing the derivative at each iteration and hence populating the AD stack unnecessarily, Clapeyron currently implements the IFT using ForwardDiff.jl through the manual manipulation of Duals. Currently only first order AD is supported. Higher order derivatives through nested Duals or Duals with different tags will error. The current AD implementation is:

Clapeyron.__gradients_for_root_findersFunction
__gradients_for_root_finders(x::AbstractVector{T},tups::Tuple,tups_primal::Tuple,f::Function) where T<:Real

Computes the gradients of x with respect to the relevant parameters in tups under the condition that x is implicitly defined through the root finding problem f(x,tups) = 0. The function uses the implicit function theorem to compute the gradients efficiently through the reconstruction of Duals.

Note

Currently only supports first order AD. Trying to differentiate nested Duals or Duals with different tags will throw an error.

source

Clapeyron previously implemented a final Newton step to propagate first order derivatives, which is also limited to first order AD. Although this method does produce numeric values for nested Duals, these higher order derivatives are incorrect as it neglects the indirect effects from lower order gradients. For a more complete explanation see this discussion.