Variational Inference¶
deepuq implements a practical family of mean-field variational inference (VI) methods built around Bayes by Backprop. The package now covers plain regression/classification, heteroscedastic regression, multi-output regression, and scalable last-layer VI on deterministic feature extractors.
1) Motivation¶
Exact Bayesian inference for neural-network weights is generally intractable. Variational inference replaces the true posterior with a tractable family and optimizes that approximation with stochastic gradients.
This gives a practical path to uncertainty-aware deep learning while keeping standard PyTorch training loops.
2) What uncertainty is quantified¶
VI in deepuq primarily quantifies epistemic uncertainty through a posterior approximation over network weights.
Posterior predictive distribution:
Monte Carlo approximation:
For heteroscedastic regression variants, the predictive distribution also includes a learned data-noise term, so the returned uncertainty decomposes into:
3) Mathematical setup / notation¶
Let
with prior \(p(w)\) and variational family \(q_{\phi}(w)\).
Mean-field Gaussian parameterization:
A common unconstrained scale parameterization is:
Reparameterization trick:
4) Core method equations¶
Canonical ELBO (maximization form):
Equivalent minimization form used in training:
Mini-batch objective with \(N_b\) optimizer steps per epoch:
Relationship to posterior KL:
5) Implemented VI variants in Deep-UQ¶
Bayes by Backprop¶
The baseline mean-field Bayesian MLP predicts either a scalar/vector regression output or classification logits.
Regression predictive moments:
Classification predictive probabilities:
Heteroscedastic Bayes by Backprop¶
The heteroscedastic regressor predicts both a mean and an observation variance. For one output dimension:
The returned uncertainty decomposes into Monte Carlo variance across sampled means plus the Monte Carlo average of the predicted observation variance.
Multi-output Bayes by Backprop¶
The multi-output regressor predicts a vector-valued response \(\mu(x)\in\mathbb R^m\). The ELBO stays the same, but the likelihood is summed or averaged across output dimensions.
Heteroscedastic multi-output Bayes by Backprop¶
This combines the previous two ideas: a vector mean and a vector of predicted noise variances. It is the most complete regression VI variant in the package, covering both multi-output predictions and explicit aleatoric noise.
Last-layer variational inference¶
For larger backbones, Deep-UQ supports VI only in the final linear head. Let a deterministic feature extractor produce \(h=\phi(x)\) and a Bayesian head predict:
This keeps the feature extractor deterministic and scales VI to CNN, operator, and other feature-based architectures while retaining uncertainty in the final mapping.
6) Practical implications¶
- Fixed \(\beta\) is useful when comparing ELBO trends across epochs.
- Larger Monte Carlo sample counts reduce estimator variance but increase compute.
- Mean-field VI is scalable but cannot represent full posterior correlations.
- Heteroscedastic regression variants are the correct choice when the data-noise level itself changes with the input.
- Last-layer VI is the most practical VI route for spatial backbones and operator models.
- Monitoring NLL and KL separately helps diagnose underfitting versus excessive regularization.
UQResult field mapping¶
predict_vi_uq(...) returns:
| Variant | Populated fields |
|---|---|
| Plain regression | mean, epistemic_var, total_var, metadata |
| Heteroscedastic regression | mean, epistemic_var, aleatoric_var, total_var, metadata |
| Multi-output regression | same fields as regression, with an extra output dimension |
| Classification | mean, probs, probs_var, epistemic_var, metadata |
| Last-layer VI | follows the configured head task (regression or classification) |
Related tutorials¶
- Bayes by Backprop Tutorial
- Heteroscedastic Bayes by Backprop + ADR1D
- Multi-Output Bayes by Backprop + Elastic Bar
- Heteroscedastic Multi-Output Bayes by Backprop + Transport2D
- Last-Layer VI + Heat2D Classification
- VI API
7) References¶
- Graves, A. (2011). Practical Variational Inference for Neural Networks. NeurIPS. Proceedings
- Blundell, C., Cornebise, J., Kavukcuoglu, K., & Wierstra, D. (2015). Weight Uncertainty in Neural Networks. ICML (PMLR 37). Proceedings
- Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. ICLR. OpenReview
- Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An Introduction to Variational Methods for Graphical Models. Machine Learning, 37, 183-233. DOI: 10.1023/A:1007665907178
- Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518), 859-877. DOI: 10.1080/01621459.2017.1285773