Gaussian Processes¶
deepuq exposes Gaussian Process methods through GaussianProcessRegressor, SparseGaussianProcessRegressor, GaussianProcessClassifier, OneVsRestGaussianProcessClassifier, HeteroscedasticGaussianProcessRegressor, MultiTaskGaussianProcessRegressor, SpectralMixtureGaussianProcessRegressor, and DeepKernelGaussianProcessRegressor.
1) Motivation¶
Gaussian Processes place a Bayesian prior directly over functions, which makes them a strong baseline when the goal is not only prediction accuracy but also calibrated uncertainty. Instead of fitting one function and treating it as certain, the model infers a posterior distribution over plausible latent functions after observing the dataset.
This is why GP methods remain important in uncertainty quantification:
- uncertainty grows naturally away from observed data,
- observation noise can be modeled explicitly,
- posterior covariance reveals coupling across inputs and tasks,
- kernel design gives direct control over smoothness, periodicity, and shared structure.
In deepuq, the GP family covers exact regression, sparse variational regression, GP classification, heteroscedastic regression, multi-task ICM, spectral mixture kernels, and deep kernel learning.
2) What Uncertainty Is Quantified¶
For regression, Gaussian Processes quantify posterior uncertainty in the latent function and, when the observation model includes noise, aleatoric uncertainty in the measurements.
With the standard regression model,
the predictive variance decomposes as
For classification, the main quantity is the predictive class probability. In the binary setting,
So the regression models quantify both latent uncertainty and noise structure, while the classification models quantify uncertainty in the class-probability surface, especially near boundaries and away from training support.
3) Mathematical Setup / Notation¶
Throughout this page, let
Kernel matrices follow the standard GP notation:
Additional symbols used below:
- \(f\) is the latent function,
- \(u = f(Z)\) are inducing variables at inducing inputs \(Z\),
- \(\sigma_n^2\) is homoscedastic observation-noise variance,
- \(\sigma_n^2(x)\) is input-dependent noise variance,
- \(B\) is the task covariance matrix in the intrinsic coregionalization model,
- \(\phi_{\psi}\) is a trainable feature extractor in deep kernel learning.
4) Core Models and Equations¶
4.1 Exact GP Regression¶
The exact GP prior is
Conditioning on observed data gives the predictive mean
and predictive covariance
The log marginal likelihood used for kernel learning is
4.2 Sparse Variational GP Regression¶
Sparse variational GP regression introduces inducing variables
The projected covariance is
A common collapsed evidence lower bound is
This is the standard sparse-GP approximation used to preserve posterior uncertainty structure while reducing the cost of exact \(N \times N\) kernel algebra.
4.3 GP Classification (Binary + OvR Multiclass)¶
For binary classification with latent score \(f_i\), the Bernoulli likelihood is
where \(\sigma(\cdot)\) is the logistic sigmoid.
deepuq uses a Laplace approximation in latent-function space. A standard logistic-Gaussian predictive approximation is
For multiclass classification, one binary GP is fit per class and the resulting scores are normalized as
4.4 Heteroscedastic GP Regression¶
When the measurement noise depends on the input, the observation model becomes
In practice, deepuq alternates between:
- a GP for the latent mean,
- a GP for the log noise process.
The transformed residual target used for the noise model is
4.5 Multi-task ICM GP Regression¶
For task indices \(t\) and \(t'\), the intrinsic coregionalization kernel is
The corresponding covariance matrix is
To guarantee positive semidefiniteness, the task covariance is parameterized as
This allows information sharing across correlated outputs while keeping task-specific uncertainty.
4.6 Spectral Mixture GP Regression¶
For lag \(\tau = x - x'\), the spectral mixture kernel takes the form
This kernel is useful when the latent signal contains several frequency bands or requires extrapolation of oscillatory structure.
4.7 Deep Kernel GP Regression¶
Deep kernel learning composes a trainable feature map with a base kernel:
This lets the model learn a representation in which the GP prior is better matched to the observed data.
5) Inference / Prediction Equations¶
For regression, the posterior predictive moments are
For exact homoscedastic regression, \(\sigma_{\mathrm{alea}}^2(x_*) = \sigma_n^2\). For heteroscedastic regression, \(\sigma_{\mathrm{alea}}^2(x_*) = \sigma_n^2(x_*)\).
For classification, the predictive probability is obtained by integrating over the latent posterior:
In one-vs-rest classification, classwise probabilities are estimated independently and then renormalized into a multiclass simplex.
6) Practical Implications¶
- Exact GP regression is the strongest calibrated baseline when the dataset is small or medium sized.
- Sparse variational GP regression trades exactness for scalability while keeping a principled probabilistic objective.
- GP classification is useful when uncertainty near failure boundaries matters.
- Heteroscedastic GP regression separates model uncertainty from input-dependent noise.
- Multi-task ICM helps when multiple outputs are correlated and should share information.
- Spectral mixture kernels are appropriate for multi-frequency or oscillatory structure.
- Deep kernel learning is useful when a fixed kernel in raw input space is too restrictive.
UQResult Field Mapping¶
predict_uq(...) outputs map as follows:
| Model Type | mean | epistemic_var | aleatoric_var | total_var | probs | probs_var |
|---|---|---|---|---|---|---|
| Regression GP family | Posterior mean | Latent posterior variance | Noise term (constant or input-dependent) | Sum of epistemic and aleatoric variance | None | None |
| Classification GP family | None | None | None | None | Class probabilities | Probability spread proxy |
7) References¶
- Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press. Book
- Titsias, M. (2009). Variational Learning of Inducing Variables in Sparse Gaussian Processes. AISTATS (PMLR 5). Proceedings
- Hensman, J., Fusi, N., & Lawrence, N. D. (2013). Gaussian Processes for Big Data. UAI. Paper
- Williams, C. K. I., & Barber, D. (1998). Bayesian Classification with Gaussian Processes. IEEE TPAMI, 20(12), 1342-1351. DOI: 10.1109/34.735807
- Le, H., Smola, A., & Canu, S. (2005). Heteroscedastic Gaussian Process Regression. ICML Workshop. Paper
- Alvarez, M. A., Rosasco, L., & Lawrence, N. D. (2012). Kernels for Vector-Valued Functions: A Review. Foundations and Trends in Machine Learning, 4(3), 195-266. DOI: 10.1561/2200000036
- Wilson, A. G., & Adams, R. P. (2013). Gaussian Process Kernels for Pattern Discovery and Extrapolation. ICML (PMLR). Proceedings
- Wilson, A. G., Hu, Z., Salakhutdinov, R., & Xing, E. P. (2016). Deep Kernel Learning. AISTATS (PMLR). Proceedings