Seven Statistical Mechanics / Bayesian Equations That You Need to Know

August 2, 2017 AJMaren Comments 9 comments

Essential Statistical Mechanics for Deep Learning

If you’re self-studying machine learning, and feel that statistical mechanics is suddenly showing up more than it used to, you’re not alone. Within the past couple of years, statistical mechanics (statistical thermodynamics) has become a more integral topic, along with the Kullback-Leibler divergence measure and several inference methods for machine learning, including the expectation maximization (EM) algorithm along with variational Bayes.

Statistical mechanics has always played a strong role in machine learning

Statistical mechanics, underlying methods for expectation maximization (EM) and variational Bayes methods for doing inference, is becoming more important as a part of our machine learning foundation. Image (“Dinosaurier: 4974”) courtesy Unik Dekor, https://www.unikdekor.se/outlet/outlet-rea/dinosaurier/.

Imagine, though, that you’re in a strange, Jurassic-park like landscape, where the world continually changes. Statistical mechanics was important in this early landscape, as it underlay the first neural network innovations: the Hopfield neural network and the Boltzmann machine.

Over time, though, these older neural networks were surrounded by mists and volcanic gas, and became lost to view. The importance of statistical mechanics waned, as we focused on more near-term and straightforward goals – building deep structures (using the tried and true backpropagation rule), building Convolution Neural Networks, and the like.

But as in any rapidly-evolving landscape, things continue to change. While expectation maximization (EM) and variational Bayes methods have been around for a while, they’re like a volcano that’s continuing to grow and erupt. These methods are now dominating our landscape more than ever before. Our world now contains not only the former (and in comparison, relatively simple methods of statistical mechanics, such as the Ising model), but the whole notion of inference now dominates our thinking. Thus, variational Bayes is now the “Mount Everest” peak that many machine learning specialists are seeking to climb.

Inference has become so important that it is driving much of practical AI these days, motivating special-purpose GPUs that can make inference much faster. As Jensen Huang said during his keynote address at the May 2017 NVIDIA GTC about his latest product release, “Volta is groundbreaking work. Its’ incredibly good at training and incredibly good at inferencing,” Jensen says. “Volta and TensorRT are ideal for inferencing.” When inference drives major corporate product releases, we know that it’s important.

That’s why the mathematics underlying inference are suddenly becoming much more obvious in our machine learning landscape.

Key Equations in Statistical Mechanics, Bayesian Probability, and Inference

Given the plethora of concepts, papers, tutorials, and other information, it might help us to back up and create a mental model or map of the inference-based landscape in machine learning. The following figure illustrates the key equations.

Seven equations from statistical mechanics and Bayesian probability theory that you need to know, including the Kullback-Leibler divergence and variational Bayes.

Why Learning Statistical Mechanics Is Like Traversing a Mountain Range

Learning statistical mechanics is like traversing a mountain range ... the bigger topics are harder climbs. Image courtesy of Lech Magnuszewski (aka "Citizen Fresh"); http://citizenfresh.deviantart.com/. — Learning statistical mechanics is like traversing a mountain range … the bigger topics are harder climbs. Image courtesy of Lech Magnuszewski (aka “Citizen Fresh”); http://citizenfresh.deviantart.com/.

The seven equations identified in the previous figure are not equally difficult. The ones in the foreground are our older equations; the basics of statistical mechanics. They correspond to Eqns. (1)-(4) in the previous figure. The fundamentals of Bayesian probability (Eqn. (5)) have also been with us for a couple of hundred years.

If we were new to the field, we could master the early rudiments of each of these – given a good text or tutorial – in about a weekend each for statistical mechanics and Bayesian probability.

It would take us more time to understand the Kullback-Leibler divergence (Eqn. (6)). And it could take us several months, or even a year or two, to fully understand the expectation maximization (EM) method and its evolution into variational Bayes (Eqn. (7)).

Thus, we might pace ourselves – and set our expectations of what we can accomplish – realistically.

Where to Start; What to Read

The References collected below contain some historical papers, and also the best that I could find in the way of current tutorials.

If I were starting from here, right now, not knowing any of this, the following Table would probably get me started in a useful way.

Reading List

You can find links to all items in the Table, together with some extra (usually key historical) resources, in the References list at the end.

Here’s to your success, as you become your own Master of the Universe!

All my best – AJM

References for Variational Bayes

Analytics Vidhya (2016), Bayesian Statistics explained to Beginners in Simple English (June 20, 2016). blogpost. (AJM’s Note: If you don’t know your Bayesian probabilities all that well, this is a fairly decent intro.)

Beal, M. (2003), Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London. pdf.

Blei, D.M., Kucukelbir, A., McAuliffe, J.D. (2016), Variational inference: a review for statisticians, arXiv:1601.00670v5 [stat.CO]. pdf.

Eisner, J. (2011), High-level explanation of variational inference, blogpost.

Feynman, R.P. (1972, 1998). Statistical Mechanics: A Set of Lectures. Reading, MA: Addison-Wesley; Amazon book listing.

Kurt, W. (2017), The Kullback-Leibler divergence explained (May 10, 2017). blogpost. (AJM’s Note: Very nice. I used this when teaching myself.)

Maren, A.J. (2017), Derivation of variational Bayes equations: Friston and Beal, Themasis Technical Note TN-2017-01. pdf

Maren, A.J. (2017), blogpost: How to Read Karl Friston (in the Original Greek. (Use this if there’s ever a link-break in the above; I’ll do my best to keep the link within the primary blogpost updated.)

Maren, A.J. (Dec., 2013) Statistical Thermodynamics: Basic Theory and Equations. THM TR2013-001(ajm).

Martin, C. (2016), Foundations: the partition function. blogpost.

Neal, R.M., & Hinton, G.E. (1998), A view of the EM algorithm that justifies incremental, sparse, and other variants, in Learning in Graphical Models (ed. M.I. Jordan) (Dordrecht: Springer Netherlands), 355-368. doi=10.1007/978-94-011-5014-9_12. pdf.

Tzikas, D., Likas, A., & Galatsanos, N. (2008),
Life after the EM algorithm: the variational approximation for Bayesian inference, IEEE Signal Processing Magazine, 131 (November, 2008), doi:10.1109/MSP.2008.929620. pdf. (AJM’s Note: A particularly nice tutorial, and a good place in which to start for an overview.

Wainwright, M.J., & Jordan, M.I. (2008), Graphical models, exponential families, and variational inference, Foundations and Trends in Machine Learning, 1 (1-2), 1-305, doi:10.1561/2200000001.
pdf.

Wikipedia: Variational Bayesian methods. (AJM’s note: I try to steer away from the Wikis as references, but this has some very nice reference-links.)

Some additional references:

Information Theory – Seminal Papers

On Information and Sufficiency, Kullback S, Leibler RA: On Information and Sufficiency. Ann. Math. Statist. 22 (1) (1951), 79-86. On Information and Sufficiency – full PDF file.
A Mathematical Theory of Communication, Shannon CE: A Mathematical Theory of Communication. The Bell System Technical Journal. XXVII (3) (July, 1948), 379-423.

Information Theory – Tutorials

Burnham, K.P., and Anderson, D.R., Kullback-Leibler information as a basis for strong inference in ecological studies, Wildlife Research (2001), 28, 111-119. pdf (Reviews concepts and methods – including K-L – in the context of practical applications to experimental data, rather than a deeply mathematical review – good for understanding basics.)
Strimmer, K. (2013). Statistical Thinking (Draft), Chapter 4: What Is Information?. pdf (Very nice, clear intro – v. good for understanding basics)
White, G. (date unknown). Information Theory and Log-Likelihood Models: A Basis for Model Selection and Inference. Course Notes for FW663, Lecture 5. pdf (Another very nice, clear intro – v. good for understanding basics)
Yu, B. (2008). Tutorial: Information Theory and Statistics – ICMLA (ICMLA, 2008, San Diego). pdf (Very rich and detailed 250-pg PPT deck, excellent for expanding understanding once you have the basics.)

Previous Related Posts

Return to:

9 thoughts on “Seven Statistical Mechanics / Bayesian Equations That You Need to Know”

Pingback: Seven Essential Machine Learning Equations: A Cribsheet – Alianna J. Maren
Pingback: Labor Day Reading and Academic Year Kick-Off – Alianna J. Maren
Pingback: A Tale of Two Probabilities – Alianna J. Maren
Derek Steffen says:

September 18, 2017 at 6:04 pm

The pdf links to the Strimmer and Burnham papers do not appear to be working. They sound interesting. Would you mind updating those?

1. AJMaren says:
  
  September 20, 2017 at 5:12 am
  
  Thanks, Derek – I’ll get to this in the early AM – appreciate the heads-up! – best – AJM
  
Pingback: Notational Frenzy – Alianna J. Maren
Pingback: Backpropagation: Not Dead, Not Yet – Alianna J. Maren
Pingback: Start Here: Statistical Mechanics for Neural Networks and AI – Alianna J. Maren
Pingback: The Yin and Yang of Learning Deep Learning – Alianna J. Maren

Alianna J. Maren

Statistical Mechanics, Neural Networks, Artificial Intelligence

Seven Statistical Mechanics / Bayesian Equations That You Need to Know

August 2, 2017 AJMaren Comments 9 comments

Essential Statistical Mechanics for Deep Learning

Statistical mechanics has always played a strong role in machine learning

Key Equations in Statistical Mechanics, Bayesian Probability, and Inference

Why Learning Statistical Mechanics Is Like Traversing a Mountain Range

Where to Start; What to Read

References for Variational Bayes

Information Theory – Seminal Papers

Information Theory – Tutorials

Previous Related Posts

Return to:

Related

9 thoughts on “Seven Statistical Mechanics / Bayesian Equations That You Need to Know”

Leave a Reply Cancel reply

Essential Statistical Mechanics for Deep Learning

Statistical mechanics has always played a strong role in machine learning

Key Equations in Statistical Mechanics, Bayesian Probability, and Inference

Why Learning Statistical Mechanics Is Like Traversing a Mountain Range

Where to Start; What to Read

References for Variational Bayes

Information Theory – Seminal Papers

Information Theory – Tutorials

Previous Related Posts

Return to:

Share this:

Related

9 thoughts on “Seven Statistical Mechanics / Bayesian Equations That You Need to Know”

Leave a Reply Cancel reply