Approximate Inference#
In Variational Inference (VI), the posterior is approximated by a simpler, parametrized distribution, often a Gaussian . The parameters of , the mean and its covariance are obtained by minimization of an appropriate information distance measure between and . As a compromise between being optimal and being computationally affordable, the variational Kullback-Leibler (KL) divergence is used:
NIFTy features two main alternatives for variational inference: Metric Gaussian Variational Inference (MGVI) and geometric Variational Inference (geoVI). A visual comparison of the MGVI and GeoVI algorithm can be found in variational_inference_visualized.py.
Metric Gaussian Variational Inference (MGVI)#
Minimizing the KL divergence with respect to all entries of the covariance is unfeasible for fields. Therefore, Metric Gaussian Variational Inference (MGVI, [1]) approximates the posterior precision matrix at the location of the current mean by the Bayesian Fisher information metric,
In practice the average is performed over by evaluating the expression at the current mean . This results in a Fisher information metric of the likelihood evaluated at the mean plus the prior information metric. Therefore we will only have to infer the mean of the approximate distribution. The only term within the KL-divergence that explicitly depends on it is the Hamiltonian of the true problem averaged over the approximation:
where expresses equality up to irrelevant (here not -dependent) terms.
Thus, only the gradient of the KL is needed with respect to this, which can be expressed as
We stochastically estimate the KL-divergence and gradients with a set of samples drawn from the approximate posterior distribution.
The particular structure of the covariance allows us to draw independent samples solving a certain system of equations.
This KL-divergence for MGVI is implemented by
SampledKLEnergy()
within NIFTy8.
Note that MGVI typically provides only a lower bound on the variance.
Geometric Variational Inference (geoVI)#
For non-linear posterior distributions an approximation with a Gaussian in the coordinates is sub-optimal, as higher order interactions are ignored. A better approximation can be achieved by constructing a coordinate system in which the posterior is close to a Gaussian, and perform VI with a Gaussian in these coordinates. This approach is called Geometric Variational Inference (geoVI). It is discussed in detail in [2].
One useful coordinate system is obtained in case the metric of the posterior can be expressed as the pullback of the Euclidean metric by :
In general, such a transformation exists only locally, i.e. in a neighbourhood of some expansion point , denoted as . Using , the GeoVI scheme uses a zero mean, unit Gaussian approximation. It can be expressed in coordinates via the pushforward by the inverse transformation :
where denotes the Kronecker-delta.
GeoVI obtains the optimal expansion point such that matches the posterior as good as possible.
Analogous to the MGVI algorithm, is obtained by minimization of the KL-divergence between and w.r.t. .
Furthermore the KL is represented as a stochastic estimate using a set of samples drawn from which is implemented in NIFTy8 via SampledKLEnergy()
with minimizer_sampling != None.
Publications#
If you use MGVI or geoVI, the authors of the respective papers would greatly appreciate a citation.