How to Read Karl Friston (in the Original Greek)
Karl Friston, whom we all admire, has written some lovely papers that are both enticing and obscure.
Cutting to the chase, what we really want to understand is this equation:
In a Research Digest article, Peter Freed writes:
… And today, Karl Friston is not explaining [the free energy principle] in a way that makes it usable to your average psychiatrist/psychotherapist on the street – which is frustrating. I am not alone in my confusion, and if you read the articles cited … [A.J.’s note: select ones are give in the References, below], neither are you. At Columbia’s Psychiatry Department, I recently led a journal club for 15 PET and fMRI researchers, PhD’s and MD’s all, with well over $10 million in NIH grants between us, and we tried to understand Friston’s 2010 Nature Reviews, Neuroscience paper – for an hour and a half. There was a lot of mathematical knowledge in the room: three statisticians, two physicists, a physical chemist …
Dr. Freed and his colleagues are, as he notes, so not alone. Friston’s style is both abstract and esoteric. In particular, he makes the kind of graceful leap from one mathematical formalism to another that is a trifle beyond most earth-bound mortals.
Thus, when I first started to study variational Bayes, and began with Friston’s various papers, I was more than lost. More than confused. More than wandering alone, in the dark woods. The blog that I wrote in 2016 on this topic (Approximate Bayesian Inference) reflected a groping for some wisdom, but not yet any real understanding.
Yet, I had a strong motive to continue.
The rationale? One of Friston’s key points is that the brain has ongoing free energy minimization processes. Should this be so, we would have a powerful method for modeling neural dynamics; one that would complement existing methods.
Friston is not the first to advocate this notion; luminaries such as Freeman (now sadly passed on) and others have made the same general argument.
What Friston offered, though, was three crucial points:
- The brain minimizes free energy (although not specifying exactly what the free energy function is),
- A variational Bayes process can help us model the free energy within the brain, and
- We can separate out the so-called “latent” or “hidden” units that are in the actual external (brain) system from those in the model.
It’s this last point that is not terribly obvious in reading Friston’s papers for the first, second, or even tenth time. However, it is an essential and core insight. It also is one of those (small, detailed) things that makes Friston’s work impenetrable to all but the most determined efforts.
Thus, seven weeks ago, as soon as I’d turned in grades and my Spring “quarter break” had officially started, I went into an altered state of consciousness; becoming “one with the equation.” There were actually two versions of the equation; one was Friston’s, and the other was Matthew Beal’s (from his 2003 dissertation; see link in the References below). David Blei and colleagues also have produced a paper that was very helpful. (There’s a link to that one also in the References.)
After nine days of total immersion in “equation-oneness,” I emerged – albeit a bit groggy and loopy – with a first level of understanding. I took a few days to cook, clean house, and do some code for my own free energy equation. (Hey, a girl has to have fun, right?) And then I went back into that altered state once again. Five days this time, running into the start of the new quarter. (Believe me, I was just a little disjointed when I had to return to consensual reality long enough to talk with students during that first week of the Summer Quarter.) But, I had a bit more understanding … that key distinction between the Beal and Friston formalisms emerged during this time; the one that rested on the separation of the “latent variables” between external system and model. (That was one subtle little distinction.)
A few more days of consensual reality, and a week of doing a (normal) write-up of an industry-style article based on the recent NVIDIA GTC. Then I wrapped up my set of notes on the Beal-to-Friston translation. (By now, I was thinking of it as a “Rosetta Stone.”)
What had begun as a Technical Note, more a crib sheet for myself than anything else, had now swelled to over fifty pages. It contained a lot of archaic and archeological material; remnants from derivation-approaches that had led me over some mathematical cliff, and from which I’d pulled back just in time.
I spent a few more days cleaning it up, and sent out a couple of review drafts … one to Friston himself, and another to a trusted colleague in the Cluster Variation Method arena. Went to bed, and … couldn’t sleep. There was just something not quite right; something that I didn’t quite understand … and when I went back to the Note, that little glitch became oh-so-painfully clear.
Another day or so of revising the draft. This time, I knew where my mistake was, and by both cleaning it up AND producing a useful figure (which you’ll see below), I was able to get much cleaner and calmer about the whole thing; simply a state of more peacefulness and rest.
Author’s Update Note, July 31, 2019: Here’s the updated version, now at 62 pages, published to arXiv:
Derivation of Variational Bayes
Here’s a crucial figure from that Note; it gives a diagrammatic illustration of the equation presented at the beginning of this post.
At 44 pages, it’s a bit bloated. But this was for my education, and I’m not going to publish it in a formal sense. (I may, after a bit, store a revised copy on ArXiv, but that still remains to be seen.)
So … if you’re reading Friston’s works, or if you’re simply looking for a straightforward derivation of the core equations in variational Bayes, please have a look.
I certainly welcome your comments and feedback. Either use the Comments form below, or send me an email (to alianna (at) aliannajmaren (dot) com).
As I tell my students at the beginning of each quarter, they get Bonus Points for “finding Dr. A.J.’s goofuses.” There are always goofuses. No matter how hard I try, there’s always a few. Thus, I’m sure there’s more than a few in the Note, and trust that you’ll help me find them.
The point of this whole exercise – my sharing my process with you, and also the resulting Technical Note itself – is simply this: it ain’t easy.
I’m pretty good at this sort of abstract, mathematical thing.
Oh, there are people who are better. I’m sure there are those who could have glanced through Beal’s dissertation, looked through Friston’s various papers, and written out the equivalence of the two equations, on the back of an envelope, in half an hour or less.
The majority of us mere mortals, though, take much more time.
If it’s taken me as long as I’ve described to get to a first-level understanding of how these equations are derived and how they relate to each other, then there are many, MANY out there who simply don’t have the better part of a two-week break to devote to working out the intricacies.
If you’re in that latter group, take heart. You’re not alone.
And my journey, over the past month or so, has convinced me: there HAS to be an easier way. It’s that translation between notation and reference frames that is the “gotcha,” not the actual equations themselves. Most of us can work our way through the derivations. It’s (first) understanding what they really mean, and (second) understanding the relation between one expression or formalism and another, that is so grueling.
This is one of the things that’s impelled me to put my other book-in-progress drafts to the side, and focus on this. For the next two or three years, for the rest of my lifetime, for as long as it takes … we just collectively need a straightforward primer that presents the essence of statistical thermodynamics (or statistical mechanics, if you’re a physicist instead of a physical chemist), Bayesian logic, information theory, and the topics that rest on these, such as the variational methods.
What we’re doing here is too important. This is the next step forward, in brain modeling, and in artificial intelligence. So, I’m going to help out by teaching what I can.
Once again; your thoughts, feedback, and wisdom will be most welcome. Thank you.
All my best – AJM
References for Variational Bayes
- Beal, M. (2003). Variational Algorithms for Approximate Bayesian Inference, Ph.D. Thesis, Gatsby Computational Neuroscience Unit, University College London. pdf.
- Blei, D.M., Kucukelbir, A., McAuliffe, J.D. (2016), Variational inference: a review for statisticians. pdf.
- Feynman, R.P. (1972, 1998). Statistical Mechanics: A Set of Lectures. Reading, MA: Addison-Wesley; Amazon book listing.
- Friston, K.; Levin, M.; Sengupta, B.; Pezzulo, G. Knowing one’s place: a free-energy approach to pattern regulation. J. R. Soc. Interface 2015, 12, 20141383. doi:10.1098/rsif.2014.1383. pdf.
- Friston, K. Life as we know it. Journal of The Royal Society Interface 2013, 10. pdf.
- Friston, K. The free-energy principle: a unified brain theory? Nature Reviews Neuroscience 2010, 11 (2), 127-138. online access.
12 thoughts on “How to Read Karl Friston (in the Original Greek)”
Personally I found this (https://arxiv.org/abs/1705.09156) very helpful. Doing a slightly different job I think (trying to summarise various aspects of Friston’s work vs. translating VB from Neal to Friston) but possibly the two of them could complement each other?
Hi, Manuel – a really good find, and thank you! I’m going to read it in some depth. (At 77 pp, of which the first 55+ are the main body, it might take a little while. But I WILL get back to you on this.)
This article dominantly seems to trace lines of thought introduced by Friston and (separately) Hinton; Friston’s notation seems to be used, and I think every paper that he’s ever written is cited here. Lots of Hinton citations also.
Thanks again – this is a very recent (May, 2017) publication; good find and much appreciated! – AJM
Have sent an Email to Dr. Friston regarding his least energy approaches, as my work has shown much of what he has found, tho his work is far, far more formal than mine.
My work uses verbal descriptions which are more amenable to common understanding, however, the outcomes seem to be very similar to what Dr. Friston has written and his return letter was encouraging.
Mine used least energy, comparison processes in brain, which are least energy, and are also a kind of logic distinct and likely generate most logics, as processes; the many comparison methods/techniques, such as trial and error; and complex system thinking and methods.
It explicates and shows how description verbal and mathematical measuring/methods are related by comparison processes in brain which generate both; finds the origins of creativity in same; and many many other approaches which add to our understanding of that modular complex system, as Gazzaniga has called it.
Altogether, it’s as I wrote to Karl, easier for us practitioners to understand, and seems to result in much the same outcomes as what he’s so well written about.
To whit his lovely Aeon.com essay on “Consciousness” which from my standpoint & model made quite a bit of good sense and was enlightening.
jochesh00.wordpress.com, AKA La Chanson san fin.