Laplace Approximation¶
deepuq exposes Laplace methods through LaplaceWrapper with six Hessian structures: diag, fisher_diag, lowrank_diag, block_diag, kron, and full.
1) Motivation¶
Modern neural networks are often accurate in-domain but can be confidently wrong away from training support. Laplace approximation adds a Bayesian posterior layer on top of a trained MAP network, so predictions include both central tendency and confidence.
The practical idea is simple: optimize once for a MAP point, then approximate the local posterior geometry around that point.
2) What Uncertainty Is Quantified¶
The method primarily quantifies epistemic uncertainty via posterior weight uncertainty.
With parameter samples \(\{\theta^{(s)}\}_{s=1}^S\):
For regression, the predictive variance returned by deepuq is:
where \(\hat\sigma^2_{\varepsilon}\) is an empirical residual-noise estimate.
For classification, predictive probabilities are Monte Carlo averaged:
3) Mathematical Setup / Notation¶
Dataset and parameters:
MAP estimator:
With isotropic Gaussian prior:
Canonical Laplace posterior:
where \(H(\theta^*)\) is a local curvature surrogate and \(\epsilon>0\) is damping.
4) Core Method Equations¶
4.1 Diagonal (diag)¶
Using empirical batch gradients \(g_b=\nabla_{\theta}\ell_b(\theta^*)\):
4.2 Empirical Fisher Diagonal (fisher_diag)¶
with
and only the diagonal retained.
4.3 Low-Rank + Diagonal (lowrank_diag)¶
Curvature decomposition:
Posterior precision:
If \(\widetilde G=G/\sqrt N\) with SVD \(\widetilde G=USV^{\top}\), then
and a diagonal residual form is
4.4 Block Diagonal (block_diag)¶
Partition parameters into \(K\) blocks:
Block curvature and precision:
4.5 Kronecker-Factored (kron)¶
For linear layer \(\ell\):
with activation and output-gradient factors:
A standard eigenbasis view is
so the layer precision spectrum is approximated by
4.6 Full (full)¶
With stacked gradients \(G\in\mathbb R^{B\times P}\):
5) Inference / Prediction Equations¶
Given \(\theta\sim q(\theta\mid\mathcal D)\), Monte Carlo prediction uses:
Regression total predictive variance:
Classification predictive probability:
6) Practical Implications¶
Curvature expressivity increases from diag to full, and cost rises accordingly.
diag/fisher_diag: memory \(\mathcal O(P)\), cheapest, weakest coupling.lowrank_diag: memory \(\mathcal O(Pr)\), captures dominant directions.block_diag: memory \(\mathcal O(\sum_k m_k^2)\), captures within-block coupling.kron: layerwise factorized coupling with favorable scaling for linear layers.full: memory \(\mathcal O(P^2)\), highest fidelity and highest cost.
Numerical and safety controls in deepuq include:
- damping \(\epsilon\) before inversion/factorization,
- parameter-count guard for expensive full-structure settings,
- structure checks for Kronecker-factorized assumptions.
UQResult Field Mapping¶
LaplaceWrapper.predict_uq(...) returns:
| Field | Regression | Classification |
|---|---|---|
mean | Predictive mean | Mean class probabilities |
epistemic_var | Posterior-sampling variance (noise removed when available) | None |
aleatoric_var | Empirical residual-noise term (if estimated) | None |
total_var | Predictive variance | None |
probs | None | Mean class probabilities |
probs_var | None | Optional probability variance (backend-dependent) |
metadata | Method/structure/likelihood details | Method/structure/likelihood details |
7) References¶
- MacKay, D. J. C. (1992). A Practical Bayesian Framework for Backpropagation Networks. Neural Computation, 4(3), 448-472. DOI: 10.1162/neco.1992.4.3.448
- Tierney, L., & Kadane, J. B. (1986). Accurate Approximations for Posterior Moments and Marginal Densities. Journal of the American Statistical Association, 81(393), 82-86. DOI: 10.1080/01621459.1986.10478240
- Martens, J., & Grosse, R. (2015). Optimizing Neural Networks with Kronecker-factored Approximate Curvature. ICML (PMLR 37). Proceedings
- Botev, A., Ritter, H., & Barber, D. (2017). Practical Gauss-Newton Optimisation for Deep Learning. ICML (PMLR 70). Proceedings
- Ritter, H., Botev, A., & Barber, D. (2018). A Scalable Laplace Approximation for Neural Networks. ICLR. Conference entry
- Daxberger, E., Kristiadi, A., Immer, A., Eschenhagen, R., Bauer, M., & Hennig, P. (2021). Laplace Redux: Effortless Bayesian Deep Learning. NeurIPS. Proceedings
- Kunstner, F., Hennig, P., & Balles, L. (2019). Limitations of the empirical Fisher approximation for natural gradient descent. NeurIPS. Proceedings