0
Looking

# A visual introduction to Gaussian Belief Propagation

A Framework for Distributed Inference with Emerging Hardware.

### Affiliations

Imperial College London, *now at DeepMind

### Published

June 21, 2021

NB: A static version of this interactive article has been published on arXiv and can be cited as:

@article{Ortiz2021visualGBP,
title = {A visual introduction to Gaussian Belief Propagation},
author = {Ortiz, Joseph and Evans, Talfan and Davison, Andrew J.},
journal={arXiv preprint arXiv:2107.02308},
year = {2021},
}

See our short video introduction to the article here.

In this article, we present a visual introduction to Gaussian Belief Propagation (GBP), a probabilistic inference algorithm that operates by passing messages between the nodes of arbitrarily structured factor graphs. A special case of loopy belief propagation, GBP updates rely only on local information and will converge independently of the message schedule. Our key argument is that, given recent trends in computing hardware, GBP has the right computational properties to act as a scalable distributed probabilistic inference framework for future machine learning systems.

## Introduction

#### Bayesian Probabilistic Inference

Bayesian probability theory is the fundamental framework for dealing with uncertain data, and is at the core of practical systems in machine learning and robotics

• Probabilistic machine learning and artificial intelligence
Z. Ghahramani.
Nature, Vol 521(7553), pp. 452--459. Nature Publishing Group. 2015.
• FutureMapping: The computational structure of spatial AI systems[PDF]
A.J. Davison.
arXiv preprint arXiv:1803.11288. 2018.
[1, 2]
. A probabilistic model relates unknown variables of interest to observable, known or assumed quantities and most generally takes the form of a graph whose connections encode those relationships. Inference is the process of forming the posterior distribution to determine properties of the unknown variables, given the observations, such as their most probable values or their full marginal distributions.

There are various possible algorithms for probabilistic inference, many of which take advantage of specific problem structure for fast performance. Efficient inference on models represented by large, dynamic and highly inter-connected graphs however remains computationally challenging and is already a limiting factor in real embodied systems. Inference on such graphs will only become more important as general purpose perception systems strive towards more heterogeneous and dynamic representations containing abstractions (such as objects or concepts) with continually changing relationships

• The consciousness prior[PDF]
Y. Bengio.
arXiv preprint arXiv:1709.08568. 2017.
• FutureMapping: The computational structure of spatial AI systems[PDF]
A.J. Davison.
arXiv preprint arXiv:1803.11288. 2018.
[3, 2]
1
In AI, there has been a recent trend towards sparse representations and graphs which are well-suited for representing sparse high dimensional data. Part of this trend has been driven by the fact that graphs are a natural representation in certain domains, and evidence of this is the rise of graph neural networks and graph-based probabilistic inference frameworks
• Factor graphs for robot perception[PDF]
F. Dellaert, M. Kaess.
Foundations and Trends in Robotics, Vol 6(1-2), pp. 1--139. Now Publishers, Inc. 2017.
• FutureMapping: The computational structure of spatial AI systems[PDF]
A.J. Davison.
arXiv preprint arXiv:1803.11288. 2018.
[4, 2]
. Another trend is driven by the idea that massive sparsity is required to scale AI compute and evidence of this is the scaling of neural networks with gated computation
• Outrageously large neural networks: The sparsely-gated mixture-of-experts layer[PDF]
N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, J. Dean.
Proceedings of the International Conference on Learning Representations (ICLR). 2017.
• Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity[PDF]
W. Fedus, B. Zoph, N. Shazeer.
arXiv preprint arXiv:2101.03961. 2021.
[5, 6]
and the development of novel AI processors specifically for high performance on sparse representations
D. Lacey. 2019.
[7]
. We believe that inference on large scale and dynamic probabilistic graphs will be an important part of this trend towards sparse computing to scale AI compute.
.

#### The Bitter Lesson

As we search for the best algorithms for probabilistic inference, we should recall the "Bitter Lesson" of machine learning (ML) research: "general methods that leverage computation are ultimately the most effective, and by a large margin"

• The Bitter Lesson[HTML]
R. Sutton.
2019.
[8]
. A reminder of this is the success of CNN-driven deep learning, which is well suited to GPUs, the most powerful widely-available processors in recent years.

Looking to the future, we anticipate a long-term trend towards a "hardware jungle"

H. Sutter. 2011.
[9]
of parallel, heterogeneous, distributed and asynchronous computing systems which communicate in a peer-to-peer manner. This will be at several levels: across networks of multiple smart devices operating in the same environment; across the many sensors, actuators and processors within individual embodied devices; and even within single processor chips themselves
2
For single processor chips, a trend towards multicore designs with distributed on-core memory and ad-hoc communication is rapidly emerging
D. Lacey. 2019.
• A survey on graph processing accelerators: Challenges and opportunities[PDF]
C. Gui, L. Zheng, B. He, C. Liu, X. Chen, X. Liao, H. Jin.
Journal of Computer Science and Technology, Vol 34(2), pp. 339--371. Springer. 2019.
[7, 10]
. This design has the key aim of increasing performance while reducing power usage by minimizing the "bits x millimetres" of data movement during computation
• Efficient processing of deep neural networks: A tutorial and survey[PDF]
V. Sze, Y. Chen, T. Yang, J.S. Emer.
Proceedings of the IEEE, Vol 105(12), pp. 2295--2329. Ieee. 2017.
[11]
.
.

To scale arbitrarily and leverage all available computation in this "hardware jungle", our key hypothesis is that we need inference methods which can operate with distributed local processing / storage and message passing communication, without the need for a global view or coordination of the whole model

3
The opposite proposition would be a global centralized inference method, for example via a monolithic cloud-based processor. This centralized approach would not leverage most of the available compute resources and even if feasible may not be desirable, for communication bandwidth or privacy reasons..

#### Gaussian Belief Propagation

Fortunately, a well-known inference algorithm exists which has these properties: Loopy Belief Propagation

• Probabilistic reasoning in intelligent systems: networks of plausible inference
J. Pearl. Morgan Kaufmann. 1988.
• Factor graphs and the sum-product algorithm
F.R. Kschischang, B.J. Frey, H. Loeliger.
IEEE Transactions on information theory, Vol 47(2), pp. 498--519. IEEE. 2001.
• Loopy belief propagation for approximate inference: An empirical study
K.P. Murphy, Y. Weiss, M.I. Jordan.
Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. 1999.
[12, 13, 14]
. Belief Progagation (BP) was invented in the 1980s
• Probabilistic reasoning in intelligent systems: networks of plausible inference
J. Pearl. Morgan Kaufmann. 1988.
[12]
, and is an algorithm for calculating the marginals of a joint distribution via local message passing between nodes in a factor graph. Factor graphs are a type of bipartite graphical model that connects variables via factors which represent independent relationships
• Factor graphs for robot perception[PDF]
F. Dellaert, M. Kaess.
Foundations and Trends in Robotics, Vol 6(1-2), pp. 1--139. Now Publishers, Inc. 2017.
• Factor Graphs: Exploiting Structure in Robotics
F. Dellaert.
Annual Review of Control, Robotics, and Autonomous Systems, Vol 4. Annual Reviews. 2021.
[4, 15]
.

BP guarantees exact marginal computation with one sequence of forward-backward message passing in tree-structured graphs

• Pattern Recognition and Machine Learning
C.M. Bishop.
Springer-Verlag New York, Inc. 2006.
[16]
, but empirically produces good results when applied to "loopy" graphs with cycles
• Loopy belief propagation for approximate inference: An empirical study
K.P. Murphy, Y. Weiss, M.I. Jordan.
Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. 1999.
[14]
. Unlike Graph neural networks
• The graph neural network model
F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, G. Monfardini.
IEEE transactions on neural networks, Vol 20(1), pp. 61--80. IEEE. 2008.
• Geometric deep learning: going beyond euclidean data[PDF]
M.M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, P. Vandergheynst.
IEEE Signal Processing Magazine, Vol 34(4), pp. 18--42. IEEE. 2017.
• Relational inductive biases, deep learning, and graph networks[PDF]
P.W. Battaglia, J.B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, others.
arXiv preprint arXiv:1806.01261. 2018.
[17, 18, 19]
which learn edge and node updates that are applied over a fixed number of message passing steps, loopy BP applies probabilistic message passing updates with iterative convergent behaviour. Although Loopy BP has been successful in a few applications such as error-correcting codes
• Turbo decoding as an instance of Pearl's "belief propagation" algorithm
R.J. McEliece, D.J.C. MacKay, J. Cheng.
IEEE Journal on selected areas in communications, Vol 16(2), pp. 140--152. IEEE. 1998.
[20]
, it has not as of yet been applied to broader machine learning problems.

One issue that has precluded the use of general Loopy BP is that it lacks convergence guarantees. However, a second and perhaps more relevant issue given modern hardware trends, is that its computational properties have not fitted the dominant processing paradigms of recent decades, CPUs and GPUs. Consequently, other factor graph inference algorithms have been preferred which take advantage of global problem structure to operate much more rapidly and robustly than loopy BP on a CPU.

We believe that now is the right time to re-evaluate the properties of loopy BP given the rise of graph-like computing hardware and sparse models. Indeed there has been a recent resurgence in interest in the algorithm

• A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs
D. George, W. Lehrach, K. Kansky, M. Lazaro-Gredilla, C. Laan, B. Marthi, X. Lou, Z. Meng, Y. Liu, H. Wang, others.
Science, Vol 358(6368). American Association for the Advancement of Science. 2017.
• Neural enhanced belief propagation on factor graphs[PDF]
V.G. Satorras, M. Welling.
International Conference on Artificial Intelligence and Statistics, pp. 685--693. 2021.
• Belief Propagation Neural Networks[PDF]
J. Kuck, S. Chakraborty, H. Tang, R. Luo, J. Song, A. Sabharwal, S. Ermon.
Neural Information Processing Systems (NeurIPS). 2020.
• Bundle Adjustment on a Graph Processor[PDF]
J. Ortiz, M. Pupilli, S. Leutenegger, A. Davison.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2020.
• Differentiable Nonparametric Belief Propagation[PDF]
A. Opipari, C. Chen, S. Wang, J. Pavlasek, K. Desingh, O.C. Jenkins.
arXiv preprint arXiv:2101.05948. 2021.
[21, 22, 23, 24, 25]
; in particular, one notable work
• Bundle Adjustment on a Graph Processor[PDF]
J. Ortiz, M. Pupilli, S. Leutenegger, A. Davison.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2020.
[24]
demonstrated that BP on novel graph processor hardware can achieve a 24x speed improvement over standard methods for bundle adjustment. We focus on the special case of Gaussian Belief Propagation (GBP)
• FutureMapping 2: Gaussian Belief Propagation for Spatial AI[PDF]
A.J. Davison, J. Ortiz.
arXiv preprint arXiv:arXiv:1910.14139. 2019.
[26]
, in which all factors are Gaussian functions of their dependent variables, and we also infer Gaussian-distributed marginals. GBP has both more extensive mathematical guarantees
• Convergence analysis of belief propagation on Gaussian graphical models[PDF]
J. Du, S. Ma, Y. Wu, S. Kar, J.M. Moura.
arXiv preprint arXiv:1801.06430. 2018.
[27]
and stronger empirical performance
• Gaussian belief propagation: Theory and Application[PDF]
D. Bickson. 2008.
[28]
than general loopy BP. It is also simple to implement, with all messages between nodes taking the form of usually low-dimensional Gaussians, which can be represented by small vectors and matrices.

In this article, we give an introduction to Gaussian Belief Propagation as a strong general purpose algorithmic and representational framework for large scale distributed inference. Throughout, we present a range of examples across 1D and 2D geometric estimation and image processing in the form of interactive simulations which allow the reader to experiment with and understand the properties of GBP. We hope that these simulations emphasize the key properties of GBP which we summarize below.

1. Local - can operate with distributed processing and storage and message passing.
2. Probabilistic - estimates uncertainties.
3. Iterative and convergent - is not run over a fixed number of steps.
4. Asynchronous - convergence can be reached via asynchronous updates.

In the remainder of the article, we first introduce the GBP algorithm and show that it is intimately linked to solving a linear system of equations

$Ax = b$
Ax = b. We then explore 4 key practical details for applying GBP to real problems: extending GBP to handle non-linear relationships and non-Gaussian data distributions, using local message schedules and lastly how hierarchical structure can help convergence.

## Technical Introduction

### Probabilistic Inference

Inference is the problem of estimating statistical properties of unknown variables

$X$
X from known or observed quantities