Seven Statistical Mechanics / Bayesian Equations That You Need to Know
Essential Statistical Mechanics for Deep Learning
If you’re self-studying machine learning, and feel that statistical mechanics is suddenly showing up more than it used to, you’re not alone. Within the past couple of years, statistical mechanics (statistical thermodynamics) has become a more integral topic, along with the Kullback-Leibler divergence measure and several inference methods for machine learning, including the expectation maximization (EM) algorithm along with variational Bayes.
Statistical mechanics has always played a strong role in machine learning
Imagine, though, that you’re in a strange, Jurassic-park like landscape, where the world continually changes. Statistical mechanics was important in this early landscape, as it underlay the first neural network innovations: the Hopfield neural network and the Boltzmann machine.
Over time, though, these older neural networks were surrounded by mists and volcanic gas, and became lost to view. The importance of statistical mechanics waned, as we focused on more near-term and straightforward goals – building deep structures (using the tried and true backpropagation rule), building Convolution Neural Networks, and the like.
But as in any rapidly-evolving landscape, things continue to change. While expectation maximization (EM) and variational Bayes methods have been around for a while, they’re like a volcano that’s continuing to grow and erupt. These methods are now dominating our landscape more than ever before. Our world now contains not only the former (and in comparison, relatively simple methods of statistical mechanics, such as the Ising model), but the whole notion of inference now dominates our thinking. Thus, variational Bayes is now the “Mount Everest” peak that many machine learning specialists are seeking to climb.
Inference has become so important that it is driving much of practical AI these days, motivating special-purpose GPUs that can make inference much faster. As Jensen Huang said during his keynote address at the May 2017 NVIDIA GTC about his latest product release, “Volta is groundbreaking work. Its’ incredibly good at training and incredibly good at inferencing,” Jensen says. “Volta and TensorRT are ideal for inferencing.” When inference drives major corporate product releases, we know that it’s important.
That’s why the mathematics underlying inference are suddenly becoming much more obvious in our machine learning landscape.
Key Equations in Statistical Mechanics, Bayesian Probability, and Inference
Given the plethora of concepts, papers, tutorials, and other information, it might help us to back up and create a mental model or map of the inference-based landscape in machine learning. The following figure illustrates the key equations.
Why Learning Statistical Mechanics Is Like Traversing a Mountain Range
The seven equations identified in the previous figure are not equally difficult. The ones in the foreground are our older equations; the basics of statistical mechanics. They correspond to Eqns. (1)-(4) in the previous figure. The fundamentals of Bayesian probability (Eqn. (5)) have also been with us for a couple of hundred years.
If we were new to the field, we could master the early rudiments of each of these – given a good text or tutorial – in about a weekend each for statistical mechanics and Bayesian probability.
It would take us more time to understand the Kullback-Leibler divergence (Eqn. (6)). And it could take us several months, or even a year or two, to fully understand the expectation maximization (EM) method and its evolution into variational Bayes (Eqn. (7)).
Thus, we might pace ourselves – and set our expectations of what we can accomplish – realistically.
Where to Start; What to Read
The References collected below contain some historical papers, and also the best that I could find in the way of current tutorials.
If I were starting from here, right now, not knowing any of this, the following Table would probably get me started in a useful way.
Reading List
You can find links to all items in the Table, together with some extra (usually key historical) resources, in the References list at the end.
Here’s to your success, as you become your own Master of the Universe!
All my best – AJM
References for Variational Bayes
Life after the EM algorithm: the variational approximation for Bayesian inference, IEEE Signal Processing Magazine, 131 (November, 2008), doi:10.1109/MSP.2008.929620. pdf. (AJM’s Note: A particularly nice tutorial, and a good place in which to start for an overview.
pdf.
Some additional references:
Information Theory – Seminal Papers
- On Information and Sufficiency, Kullback S, Leibler RA: On Information and Sufficiency. Ann. Math. Statist. 22 (1) (1951), 79-86. On Information and Sufficiency – full PDF file.
- A Mathematical Theory of Communication, Shannon CE: A Mathematical Theory of Communication. The Bell System Technical Journal. XXVII (3) (July, 1948), 379-423.
Information Theory – Tutorials
- Burnham, K.P., and Anderson, D.R., Kullback-Leibler information as a basis for strong inference in ecological studies, Wildlife Research (2001), 28, 111-119. pdf (Reviews concepts and methods – including K-L – in the context of practical applications to experimental data, rather than a deeply mathematical review – good for understanding basics.)
- Strimmer, K. (2013). Statistical Thinking (Draft), Chapter 4: What Is Information?. pdf (Very nice, clear intro – v. good for understanding basics)
- White, G. (date unknown). Information Theory and Log-Likelihood Models: A Basis for Model Selection and Inference. Course Notes for FW663, Lecture 5. pdf (Another very nice, clear intro – v. good for understanding basics)
- Yu, B. (2008). Tutorial: Information Theory and Statistics – ICMLA (ICMLA, 2008, San Diego). pdf (Very rich and detailed 250-pg PPT deck, excellent for expanding understanding once you have the basics.)
Previous Related Posts
- How to Read Karl Friston (in the Original Greek)
- Approximate Bayesian Inference
- The Single Most Important Equation for Brain-Computer Information Interfaces (Kullback-Leibler)
9 thoughts on “Seven Statistical Mechanics / Bayesian Equations That You Need to Know”
The pdf links to the Strimmer and Burnham papers do not appear to be working. They sound interesting. Would you mind updating those?
Thanks, Derek – I’ll get to this in the early AM – appreciate the heads-up! – best – AJM