The Statistical Mechanics Underpinnings of Machine Learning
Machine Learning Is Different Now:
Actually, machine learning is a continuation of what it always has been, which is deeply rooted in statistical physics (statistical mechanics). It’s just that there’s a culmination of insights that are now a very substantive body of work, with more theoretical rigor behind them than most of us know.
A Lesson from Mom:
It takes a lot of time to learn a new discipline. This is something that I learned from my mom. Mom graduated from college a bit too young, so her daddy footed the bill for her to get a master’s degree in genetics, which (at that time) was a subfield of biology.
Just as she was getting her thesis wrapped up, Watson and Crick discovered DNA. Overnight, the field of genetics changed. It was no longer biology; it was physical chemistry. So, my mom had her degree – but no way forward with her career. (Is this starting to sound scarily familiar to you? Thought so.)
In order to continue, she had to learn physical chemistry. To learn physical chemistry, she first had to learn calculus (which she had carefully avoided during her academic sojourn). To learn calculus, she first had to learn analytical geometry.
So, there she was, taking “analeetical geoMEtry,” as taught by the brilliant Russian mathematician Ervand George Kogbetliantz, who had made his way to the U.S. by fleeing just ahead of the war; from Russia, to Germany, to France, to the United States. He therefore talked with a Russian-German-French accent. Brilliant, but no-go from my mother’s perspective. (However, she did tell me how his invention of 3-D chess was featured in Life Magazine.)
So mom had her degree, but had trained for a field that no longer existed – biologically-based genetics. The new field, physical chemistry-based genetics, required a two-year runway just to get started.
My take-away from my mother’s experience: I never wanted to be more than two years away from learning something new that was important to me. I didn’t have the notion of a long runway back then, but I was internalizing the basic idea.
Even Einstein Needed a Long Runway
Albert Einstein was a brilliant and creative genius, but mathematics was not his natural forte.
And with the help of his friend Marcel Grossman, also a mathematician, Einstein spent several exhausting years learning the mathematics of curved spaces, or what mathematicians call “differential geometry”. As Einstein commented, “compared with understanding gravity, special relativity was mere child’s play.”
Read more at: Physics.org on Einstein.
So if even Einstein needed a long runway to get going on his deepest and most powerful creative ideas, what does that tell us?
We need to give ourselves the benefit of a long runway also. And sometimes, it takes us a while to just get ourselves to the runway.
Message to all of us: We need to pace ourselves. We need to set realistic expectations for how much we can do within any given timeframe. This is not a one-quarter, or one-semester, task.
Expectation Maximization: A Long Runway Equation
Two weeks ago, I talked with you about seven key machine learning equations. The expectation maximization equation wasn’t on the list.
The reason?
It’s more like a process, or an algorithm, than a specific equation. In fact it’s a two-step process, and has become increasingly important in machine learning. I’m not going to describe it here, except to say that it fits in between Eqns. 6 & 7 of my blogpost two weeks ago.
So … courtesy of a question asked by one of my students in last night’s online Synch session for one of my Northwestern University classes … next week’s blogpost will provide a crib sheet to the seven basic equations, and I’ll put in a little something on the expectation maximization (EM) process as well. Not a full-fledged tutorial; more standing back and getting a view-from-the-distance of the mountain range that we’ll be traversing.
At least, though, we’ll be describing the mountains in more detail. Specifically, we’ll look at Eqns. 6 & 7, and the EM process (as bridging between them), and we’ll identify how getting on top of these equations and the EM process is a long runway task.
And by-the-way, it was egg on my face to realize that I’d plunked down seven equations that were making sense to me, but were totally unfamiliar to my students … and if you’re reading this post, you’re in that community. So … an overview of the map, or topography, is forthcoming.
The full exposition?
That’s worth a book.
St. Crispin’s Day: October 25th
St. Crispin’s Day is coming; it’s October 25th.
I’m setting something up. It’s not going to be a complete long runway task, because it’s the first pass at building the runway. However, I hope to cover crucial essentials.
Spoiler alert: What are you going to be doing during Christmas vacation?
I’ll be announcing something on St. Crispin’s day.
It won’t be for everyone. In fact, it will be for very, VERY few.
Or, as Shakespeare put it,
We few, we happy few, we band of (sisters/) brothers;
For (s/)he to-day that sheds his blood with me
Shall be my (sister/) brother …St. Crispin’s Day speech, in Henry V, by Wm. Shakespeare
Live free or die, my friend –
AJ Maren
Live free or die: Death is not the worst of evils.
Attr. to Gen. John Stark, American Revolutionary War
15 thoughts on “The Statistical Mechanics Underpinnings of Machine Learning”
Prof Maren, this was a timely post for me. I’m going through some anxiety of picking up the vast knowledge in this space needed to be successful. One challenge I face is this – I feel in the workplace today, one is expected to have a short runway – always – to be competitive, to deliver value to the internal & external clients. I would love to have a long runway, but job security is a question on my mind. I tend to think it’s easier to have a long runway if one is in academia. What are your thoughts on this?
Hi, Rahul – and thanks for your comment!
Yes, I agree – many of us are picking up what we can, from multiple sources. And you’re right, corporate employers expect their leading-edge people to know this stuff, already! They see that there are lots of experts out there; lots of articles and blogs and programs, and so they expect you to be proficient, with “zero runway.”
On one point – I don’t think that people in academia have that much of a long runway. Or rather, the only “long runway” people are the graduate students; those who take on “voluntary poverty” for a few years in order to learn the materials in depth.
Most people wanting to learn AI and machine learning in depth, though, don’t have the option of taking a few years off. This group of people (you, my other students at NU, and almost everyone reading this post) has to learn on the side – while working a grueling corporate job, taking care of spouse and family, and paying the bills.
There is a solution. Instead of just making it a reply here, I’m going to devote this Thursday’s blog to the answer. (Means I’m adjusting my editorial calendar to meet this question – and that means that your question really is IMPORTANT, to a LOT of people! Which is why I’m so glad that you’ve asked it.)
Look for it this Thursday, August 24th, “Finding a Short Runway for AI.”
Until then – live free or die! – best – AJM
Look