MCMC (SGLD)¶
deepuq provides SGLD-based posterior sampling via SGLDOptimizer and prediction utilities such as collect_posterior_samples.
1) Motivation¶
Deterministic training yields a point estimate of parameters. For uncertainty-aware prediction, we want samples from a posterior distribution over parameters. Stochastic Gradient Langevin Dynamics (SGLD) approximates this by combining stochastic gradients with Langevin noise.
2) What Uncertainty Is Quantified¶
SGLD quantifies epistemic uncertainty by sampling multiple plausible parameter settings from an approximate posterior trajectory.
Predictive distribution:
3) Mathematical Setup / Notation¶
Define posterior energy:
Then:
Continuous-time Langevin diffusion:
where \(W_t\) is standard Brownian motion.
4) Core Method Equations¶
SGLD Euler-Maruyama discretization with stochastic gradient \(\widehat\nabla U(\theta_t)\):
Posterior sampling uses:
- burn-in period before retaining samples,
- optional thinning to reduce autocorrelation,
- multiple retained states \(\{\theta^{(s)}\}_{s=1}^{S}\) for prediction.
A useful efficiency diagnostic is effective sample size:
where \(\rho_k\) is lag-\(k\) autocorrelation.
5) Inference / Prediction Equations¶
Regression predictive mean and variance:
Classification predictive probabilities:
6) Practical Implications¶
- Step-size schedule \(\eta_t\) controls the bias-variance tradeoff of samples.
- Too-short burn-in yields biased uncertainty estimates.
- Strong sample autocorrelation reduces effective posterior sample quality.
- Compared with VI/Laplace, SGLD can represent richer posterior geometry but usually costs more wall-clock time.
UQResult Field Mapping¶
predict_with_samples_uq(...) returns:
| Field | Regression | Classification (apply_softmax=True) |
|---|---|---|
mean | Predictive mean | Mean class probabilities |
epistemic_var | Variance across posterior samples | Probability variance across samples |
aleatoric_var | None | None |
total_var | Same as epistemic_var | Same as epistemic_var |
probs | None | Mean class probabilities |
probs_var | None | Probability variance |
metadata | Method/sample/task info | Method/sample/task info |
7) References¶
- Welling, M., & Teh, Y. W. (2011). Bayesian Learning via Stochastic Gradient Langevin Dynamics. ICML. Paper
- Teh, Y. W., Thiery, A. H., & Vollmer, S. J. (2016). Consistency and Fluctuations for Stochastic Gradient Langevin Dynamics. Journal of Machine Learning Research, 17(7), 1-33. JMLR
- Vollmer, S. J., Zygalakis, K. C., & Teh, Y. W. (2016). Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics. Journal of Machine Learning Research, 17(159), 1-48. JMLR
- Ma, Y.-A., Chen, T., & Fox, E. B. (2015). A Complete Recipe for Stochastic Gradient MCMC. NeurIPS. Proceedings