Skip to content

MCMC API

This page documents the SGLD-based MCMC helpers provided by deepuq.methods.mcmc. These helpers expose a lower-level workflow than the wrapper-style APIs, so the notes here focus on sample collection and predictive aggregation.

Public objects

  • SGLDOptimizer
  • collect_posterior_samples
  • predict_with_samples
  • predict_with_samples_uq

Parameter and variable conventions

Name Meaning
lr SGLD step size
weight_decay L2 penalty added to the stochastic gradient
n_steps total SGLD updates
burn_in fraction of early updates discarded before collecting samples
loss_fn loss used to compute stochastic gradients
samples list of state-dict snapshots collected after burn-in
apply_softmax convert logits to probabilities before aggregating
device device used for optimization or evaluation

Workflow expectations

  1. instantiate a deterministic model
  2. call collect_posterior_samples(...) with a training loader and loss
  3. reuse the returned samples with predict_with_samples(...) or predict_with_samples_uq(...)

Input and output shapes

  • collect_posterior_samples(...) expects minibatches (x, y) from data_loader.
  • predict_with_samples(...) returns tensors with the same trailing shape as one model forward pass.
  • classification helpers typically use outputs shaped [batch, n_classes].

UQResult mapping

predict_with_samples_uq(...) populates:

  • regression: mean, epistemic_var, total_var
  • classification (apply_softmax=True): mean, probs, probs_var, and epistemic_var

Common preconditions and failure modes

  • the architecture used for prediction must match the architecture used to collect samples
  • burn_in should be in [0, 1) to keep a meaningful number of posterior samples
  • loss_fn must match the task; the default is cross-entropy
  • apply_softmax=True should only be used when the model emits logits

Minimal example

samples = collect_posterior_samples(
    model,
    train_loader,
    n_steps=500,
    lr=1e-4,
    loss_fn=torch.nn.CrossEntropyLoss(),
    device="cuda",
)
uq = predict_with_samples_uq(model, samples, x_test, apply_softmax=True)

deepuq.methods.mcmc

MCMC utilities based on Stochastic Gradient Langevin Dynamics (SGLD) and HMC.

CyclicalSGMCMC

Cyclical Stochastic Gradient MCMC for posterior sampling.

Uses cosine annealing within each cycle and collects samples at the end of each cycle (low LR region).

Parameters:

Name Type Description Default
model Module

Neural network to sample from.

required
base_optimizer_cls

Optimizer class (e.g. SGHMCOptimizer or SGLDOptimizer).

required
cycle_length int

Number of training steps per cycle.

50
n_cycles int

Number of full cycles to run.

4
samples_per_cycle int

Number of posterior samples to collect at the end of each cycle.

3

run

run(train_loader, loss_fn) -> list[dict[str, torch.Tensor]]

Execute cyclical SGMCMC and return collected posterior samples.

Parameters:

Name Type Description Default
train_loader

Iterable of (inputs, targets) mini-batches.

required
loss_fn

Loss function for computing gradients.

required

Returns:

Type Description
list[dict[str, Tensor]]

Collected state-dict snapshots.

SGHMCOptimizer

Bases: Optimizer

Stochastic Gradient Hamiltonian Monte Carlo optimizer.

Maintains a velocity buffer per parameter and applies the SGHMC update: v = (1 - momentum_decay) * v - lr * grad + N(0, 2momentum_decaylr) * noise_scale theta = theta + v

Parameters:

Name Type Description Default
params

Iterable of parameters to optimize.

required
lr

Step size.

0.0001
momentum_decay

Friction coefficient for the velocity.

0.01
noise_scale

Scaling factor for the injected noise.

1.0
num_training_samples

Number of training samples (used for gradient scaling context).

1000

step

step()

Apply one SGHMC parameter update in-place.

SGLDOptimizer

Bases: Optimizer

Stochastic Gradient Langevin Dynamics optimizer.

This optimizer performs an SGD-like update with additive Gaussian noise calibrated by the step size, following Welling & Teh (2011).

Parameters:

Name Type Description Default
params

Iterable of parameters to optimize.

required
lr

SGLD step size.

0.001
weight_decay

Optional L2 penalty added to the stochastic gradient.

0.0

step

step()

Apply one SGLD parameter update in-place.

Returns:

Type Description
None

The update is applied directly to the optimizer parameters.

collect_posterior_samples

collect_posterior_samples(
    model: Module,
    data_loader,
    n_steps=1000,
    lr=0.0001,
    weight_decay=0.0001,
    burn_in=0.2,
    loss_fn=None,
    device="cpu",
)

Run SGLD and collect posterior parameter snapshots.

Parameters:

Name Type Description Default
model Module

Neural network to sample.

required
data_loader

Iterable of mini-batches.

required
n_steps

Total SGLD updates.

1000
burn_in

Fraction of updates to skip before collecting snapshots.

0.2
loss_fn

Loss used to compute stochastic gradients. Defaults to cross-entropy.

None
device

Device on which optimization runs.

'cpu'

Returns:

Type Description
list[dict[str, Tensor]]

State-dict snapshots collected after burn-in. Each element can be fed into predict_with_samples or predict_with_samples_uq.

predict_with_samples

predict_with_samples(
    model: Module,
    samples,
    x,
    apply_softmax=True,
    device="cpu",
)

Predictive mean and variance from stored parameter samples.

Parameters:

Name Type Description Default
model Module

Model architecture compatible with the saved state dicts.

required
samples

Posterior parameter snapshots, typically from collect_posterior_samples.

required
x

Evaluation inputs.

required
apply_softmax

If True, convert logits into probabilities before aggregation.

True
device

Device used for model evaluation.

'cpu'

Returns:

Type Description
(mean, var):

Predictive mean and variance over the posterior sample dimension.

predict_with_samples_uq

predict_with_samples_uq(
    model: Module,
    samples,
    x,
    apply_softmax=True,
    device="cpu",
) -> UQResult

Return posterior-sample predictive moments in UQResult form.

epistemic_var stores the variance across posterior samples. No separate aleatoric component is estimated.