nifty8.re.evidence_lower_bound module#

estimate_evidence_lower_bound(likelihood, samples, n_eigenvalues, min_lh_eval=0.001, batch_size=10, tol=0.0, verbose=True)[source]#

Provides an estimate for the Evidence Lower Bound (ELBO).

Statistical inference deals with the problem of hypothesis testing, given some data and models that can describe it. In general, it is hard to find a good metric to discern between different models. In Bayesian Inference, the Bayes factor can serve this purpose. To compute the Bayes factor it is necessary to calculate the evidence, given the specific model p(
d|\text{model}) at hand. Then, the ratio between the evidence of a model A and the one of a model B represents how much more likely it is that model A represents the data better than model B.

The evidence for an approximated-inference problem can in principle be calculated. However, this is only practically feasible in a low-dimensional setting. What often can be computed is

\log(p(d)) - D_\text{KL} \left[ Q(\theta(\xi)|d) || p(\theta(\xi) | d) \right] = -\langle H(\theta(
\xi), d)\rangle + \frac1 2 \left( N + \text{Tr } \log\Lambda\right),

where D_\text{KL} \left[ Q || p \right] is the Kullback-Leibler (KL) divergence between the approximating posterior distribution Q and the actual posterior p. Since the Kullback-Leibler divergence D_\text{KL} [\cdot, \cdot] \geq 0 is positive definite, it is convenient to consider the lower bound

\log(p(d)) \geq -\langle H(\theta(\xi), d)\rangle + \frac1 2 \left(N + \text{Tr } \log\Lambda
\right),

which takes the name of Evidence Lower Bound (ELBO).

If the KL divergence is well minimized (which should always be the case when a Variational Inference approach is followed), then it is possible to utilize the ELBO (as a proxy for the actual evidences) and calculate the Bayes factors for model comparison.

Parameters:
  • likelihood (nifty8.re.likelihood.Likelihood) – Log-likelihood of the model.

  • samples (nifty8.re.evi.Samples) – Collection of samples from the posterior distribution.

  • n_eigenvalues (int) – Maximum number of eigenvalues to be considered for the estimation of the log-determinant of the metric. Note that if n_eigenvalues equals the total number of relevant degrees of freedom of the problem, all relevant eigenvalues are always computed irrespective of other stopping criteria.

  • min_lh_eval (float) – Smallest eigenvalue of the likelihood to be considered. If the estimated eigenvalues become smaller than 1 + min_lh_eval, the eigenvalue estimation terminates and uses the smallest eigenvalue as a proxy for all remaining eigenvalues in the trace-log estimation. Default is 1e-3.

  • batch_size (int) – Number of batches into which the eigenvalue estimation gets subdivided into. Only after completing one batch the early stopping criterion based on min_lh_eval is checked for.

  • tol (Optional[float]) – Tolerance on the eigenvalue calculation. Zero indicates machine precision. Default is 0.

  • verbose (Optional[bool]) – Print list of eigenvalues and summary of evidence calculation. Default is True.

Returns:

  • `elbo_samples` (np.array) – List of elbo samples from the posterior distribution. The samples are returned to allow for more accurate elbo statistics.

  • stats (dict) – Dictionary with a summary of the statistics of the estimated ELBO. The keys of this dictionary are:

    • elbo_mean: returns the mean value of the elbo estimate calculated over posterior samples

    • elbo_up: returns an upper bound to the elbo estimate (given by one posterior-sample standard deviation)

    • elbo_lw: returns a lower bound to the elbo estimate (one standard deviation plus a maximal error on the metric trace-log)

    • lower_error: maximal error on the metric trace-log term given by the number of relevant metric eigenvalues different from 1 neglected in the estimation of the trace-log times the log of the smallest calculated eigenvalue.

Warning

To perform Variational Inference there is no need to take into account quantities that are not explicitly dependent on the inferred parameters. Explicitly calculating these terms can be expensive, therefore they are neglected in NIFTy. Since in most cases they are also not required for model comparison, the provided estimate may not include terms which are constant in these parameters. Only when comparing models for which the likelihood includes (possibly data-dependent) constants (or when the ELBO is needed to approximate the true evidence) these contributions have to be considered. For example, for a Gaussian distributed signal and a linear problem (Wiener Filter problem) the only term missing is -\frac1 2 \log \det |2 \pi N|, where N is the noise covariance matrix.

See also

For further details we refer to:

  • Analytic geoVI parametrization: P. Frank et al., Geometric Variational Inference

<https://arxiv.org/pdf/2105.10470.pdf> (Sec. 5.1) * Conceptualization: A. Kostić et al. (manuscript in preparation).