A visual introduction to Gaussian Belief Propagation

A Framework for Distributed Inference with Emerging Hardware.


June 21, 2021

NB: A static version of this interactive article has been published on arXiv and can be cited as:

  title = {A visual introduction to Gaussian Belief Propagation},
  author = {Ortiz, Joseph and Evans, Talfan and Davison, Andrew J.},
  journal={arXiv preprint arXiv:2107.02308},
  year = {2021},

See our short video introduction to the article here.

BeliefTrue marginal
Messages sent: 63
Gaussian Belief Propagation performs probabilistic inference iteratively and is convergent even when messages are passed randomly through the graph. Here GBP is applied to a geometric grid alignment problem. See an interactive version of this figure later in the article.

In this article, we present a visual introduction to Gaussian Belief Propagation (GBP), a probabilistic inference algorithm that operates by passing messages between the nodes of arbitrarily structured factor graphs. A special case of loopy belief propagation, GBP updates rely only on local information and will converge independently of the message schedule. Our key argument is that, given recent trends in computing hardware, GBP has the right computational properties to act as a scalable distributed probabilistic inference framework for future machine learning systems.


Bayesian Probabilistic Inference

Bayesian probability theory is the fundamental framework for dealing with uncertain data, and is at the core of practical systems in machine learning and robotics . A probabilistic model relates unknown variables of interest to observable, known or assumed quantities and most generally takes the form of a graph whose connections encode those relationships. Inference is the process of forming the posterior distribution to determine properties of the unknown variables, given the observations, such as their most probable values or their full marginal distributions.

There are various possible algorithms for probabilistic inference, many of which take advantage of specific problem structure for fast performance. Efficient inference on models represented by large, dynamic and highly inter-connected graphs however remains computationally challenging and is already a limiting factor in real embodied systems. Inference on such graphs will only become more important as general purpose perception systems strive towards more heterogeneous and dynamic representations containing abstractions (such as objects or concepts) with continually changing relationships In AI, there has been a recent trend towards sparse representations and graphs which are well-suited for representing sparse high dimensional data. Part of this trend has been driven by the fact that graphs are a natural representation in certain domains, and evidence of this is the rise of graph neural networks and graph-based probabilistic inference frameworks . Another trend is driven by the idea that massive sparsity is required to scale AI compute and evidence of this is the scaling of neural networks with gated computation and the development of novel AI processors specifically for high performance on sparse representations . We believe that inference on large scale and dynamic probabilistic graphs will be an important part of this trend towards sparse computing to scale AI compute. .

The Bitter Lesson

As we search for the best algorithms for probabilistic inference, we should recall the "Bitter Lesson" of machine learning (ML) research: "general methods that leverage computation are ultimately the most effective, and by a large margin" . A reminder of this is the success of CNN-driven deep learning, which is well suited to GPUs, the most powerful widely-available processors in recent years.

Looking to the future, we anticipate a long-term trend towards a "hardware jungle" of parallel, heterogeneous, distributed and asynchronous computing systems which communicate in a peer-to-peer manner. This will be at several levels: across networks of multiple smart devices operating in the same environment; across the many sensors, actuators and processors within individual embodied devices; and even within single processor chips themselves For single processor chips, a trend towards multicore designs with distributed on-core memory and ad-hoc communication is rapidly emerging . This design has the key aim of increasing performance while reducing power usage by minimizing the "bits x millimetres" of data movement during computation ..

To scale arbitrarily and leverage all available computation in this "hardware jungle", our key hypothesis is that we need inference methods which can operate with distributed local processing / storage and message passing communication, without the need for a global view or coordination of the whole model The opposite proposition would be a global centralized inference method, for example via a monolithic cloud-based processor. This centralized approach would not leverage most of the available compute resources and even if feasible may not be desirable, for communication bandwidth or privacy reasons..

Gaussian Belief Propagation

Fortunately, a well-known inference algorithm exists which has these properties: Loopy Belief Propagation . Belief Progagation (BP) was invented in the 1980s , and is an algorithm for calculating the marginals of a joint distribution via local message passing between nodes in a factor graph. Factor graphs are a type of bipartite graphical model that connects variables via factors which represent independent relationships .

BP guarantees exact marginal computation with one sequence of forward-backward message passing in tree-structured graphs , but empirically produces good results when applied to "loopy" graphs with cycles . Unlike Graph neural networks which learn edge and node updates that are applied over a fixed number of message passing steps, loopy BP applies probabilistic message passing updates with iterative convergent behaviour. Although Loopy BP has been successful in a few applications such as error-correcting codes , it has not as of yet been applied to broader machine learning problems.

One issue that has precluded the use of general Loopy BP is that it lacks convergence guarantees. However, a second and perhaps more relevant issue given modern hardware trends, is that its computational properties have not fitted the dominant processing paradigms of recent decades, CPUs and GPUs. Consequently, other factor graph inference algorithms have been preferred which take advantage of global problem structure to operate much more rapidly and robustly than loopy BP on a CPU.

We believe that now is the right time to re-evaluate the properties of loopy BP given the rise of graph-like computing hardware and sparse models. Indeed there has been a recent resurgence in interest in the algorithm ; in particular, one notable work demonstrated that BP on novel graph processor hardware can achieve a 24x speed improvement over standard methods for bundle adjustment. We focus on the special case of Gaussian Belief Propagation (GBP) , in which all factors are Gaussian functions of their dependent variables, and we also infer Gaussian-distributed marginals. GBP has both more extensive mathematical guarantees and stronger empirical performance than general loopy BP. It is also simple to implement, with all messages between nodes taking the form of usually low-dimensional Gaussians, which can be represented by small vectors and matrices.

In this article, we give an introduction to Gaussian Belief Propagation as a strong general purpose algorithmic and representational framework for large scale distributed inference. Throughout, we present a range of examples across 1D and 2D geometric estimation and image processing in the form of interactive simulations which allow the reader to experiment with and understand the properties of GBP. We hope that these simulations emphasize the key properties of GBP which we summarize below.

  1. Local - can operate with distributed processing and storage and message passing.
  2. Probabilistic - estimates uncertainties.
  3. Iterative and convergent - is not run over a fixed number of steps.
  4. Asynchronous - convergence can be reached via asynchronous updates.

In the remainder of the article, we first introduce the GBP algorithm and show that it is intimately linked to solving a linear system of equations Ax = b. We then explore 4 key practical details for applying GBP to real problems: extending GBP to handle non-linear relationships and non-Gaussian data distributions, using local message schedules and lastly how hierarchical structure can help convergence.

Technical Introduction

Probabilistic Inference

Inference is the problem of estimating statistical properties of unknown variables X from known or observed quantities