Backpropagation: Not Dead, Not Yet
Backpropagation: Why It Still Matters
Thirty years ago, at the dawn of the neural networks era, backpropagation was all the rage. In the minds of most people, it was infinitely preferable to the simulated annealing algorithm that Hinton et al. had proposed for their Boltzmann machine.
Now, it seems as though the see-saw of algorithm popularity has shifted; we’re focused on energy-based methods. We might be asking: is backpropagation old hat?
Good question!
Even more than that, someone coming into neural networks and machine learning might (very realistically) be asking: do I really need to learn the backprop derivation? REALLY? (All those tedious differential equations … sigh.)
These days, it is way too easy to skim around the edges of algorithms, given the lovely and delicious offerings of TensorFlow, Keras, and the like. Why dive into the depths of an algorithm if nicely canned versions are available just for the asking?
When we’re used to R scripts, and when we feel comfortable knowing in general about how an algorithm such as ARIMA works (without having to write the code that finds the exact coefficients), we just use the package and trust that the encoded algorithm know what it’s doing. Similarly, it seems natural to just use a backpropagation package. Besides, current systems – TensorFlow, Keras, and the like – just make it so easy.
Dangerous temptation!
Where Backpropagation Fits In Now
Backpropagation still serves us, in three essential ways:
- It’s the go-to reference standard for gradient descent. All other gradient descent algorithms exist in some contrast-and-compare framework with backpropagation,
- It’s still the workhorse in many deep learning architectures, and
- It’s the point-of-departure for new algorithms and innovations.
An example of the last point, and the motivation for today’s post, is the equilibrium propagation algorithm proposed recently by Scellier and Bengio. (This paper is now on the top of my ever-growing “to-read” stack.”)
Notice, if you will, that in order to even start reading this paper, you’d need to know three things in some depth:
- The backpropagation algorithm, and preferably its full derivation,
- The continuous Hopfield model (never mind the simple binary Hopfield model; that would be too easy), along with Contrastive Hebbian Learning and Contrastive Divergence (a method for training Restricted Boltzmann machines), and
- The notion of reaching equilibrium – which is a code-word for understanding free energy minimization, which means that you need the full statistical mechanics lead-in of microstates, the partition function, and probabilities – all of which would lead you to free energy and the minimization thereof.
So, I didn’t finish the Scellier and Bengio article this last weekend. I was too busy revising and refining my backpropagation chapter, which I’ve now decided to include in my forthcoming book, Statistical Mechanics, Neural Networks, and Machine Learning.
It’s more than not letting a perfectly good chapter go to waste; it’s about realizing that if someone is going to read a get-up-to-speed book, we still need to cover backpropagation. We need the statistical mechanics (and the Bayesian probabilities) in order to get into energy-based models. However, we need the classic backpropagation gradient descent method as well.
Deriving the Backpropagation Algorithm
All this said, you’re probably convinced that you want to at least look at the backpropagation derivation, and possibly work through it. Ideally, you’d like to cross-correlate the final derivation equations (for both the hidden-to-output connection weights, v, and the input-to-hidden weights, w) with the corresponding code in a full Python backpropagation code.
Yes, YOU CAN.
Just give me a few more days.
I did a twelve-hour intense-burn chapter revision this last Sunday, and the backpropagation chapter is nearly there. I did several hours of putting together a slidedeck yesterday, and have one more slidedeck to build – dealing with the second of two steps in (simple) backpropagation weight adjustments. (The first is adjusting the hidden-to-output connection weights, v, and the second is adjusting the input-to-hidden weights, w). I have working Python code; it just needs clean-up.
I’ve been producing this material for my Northwestern University Special Topics in Deep Learning class. Now, I’m going to host this same material on the Statistical Mechanics, Neural Networks, and Machine Learning LINKED Table of Contents Page. This page will only be available to members of my private Machine Learning group. Kind of obviously, the LINKED Table of Contents Page will contain links to good things.
Deriving the Backpropagation Algorithm
This is why I’m asking you to Opt-In. (After all, there are plenty of web-freebies that are immediate-access, no opt-ins required.)
Three good reasons:
- It’s not just the stuff that’s there right now. This is an ongoing, growing book-in-progress . You’ll want me to tell you when I’ve got a new chapter up, right? Or a new slidedeck, or (even better) new code. So I need your email address to tell you about this. (You can opt-out any time, however.)
- This is a conversation , not just a one-shot deal. By opting-in, you give me a chance to ask for your feedback. More of this? Less of that? More code? More pictures? A clearer walk-through? It will be your chance to tell me.
- The interesting conversations will be happening here. A chance to go into subtleties and nuances that most people reading this post would just rather ignore. But if you’re serious about machine learning, these topics will delight your mind, absorb your attention, and help you feel that you’re really getting on top of this evolving area.
When this material is ready – even provisionally early-draft ready – and even just the backpropagation chapter – I’m going to send out the look-see invitation to members of my Machine Learning group.
To receive the invitation: Opt-In to join my private Machine Learning group:
- IMMEDIATE GRATIFICATION: Access the Précis and Microstates Bonus Slidedeck – be able to read and understand the Seven Key Equations in Machine Learning,
- ONGOING GRATIFICATION (NEW): Eight-day follow-up microstates tutorial sequence, including two more bonus microstates slidedecks – easiest and most pain-free introduction to microstates, the partition function, and probabilities for energy-based machine learning, and
- DELAYED GRATIFICATION (FORTHCOMING): Invitation to see advance draft chapters from Statistical Mechanics, Neural Networks, and Machine Learning, which will ALSO include slidedecks and other teaching materials – including the full-length backpropagation chapter.
Opt-in right HERE:
Super-Important P.S.: Once you Opt-In, you’ll get the usual “confirmation” email. After that, starting immediately, you’ll be getting a series of FOLLOW-UP EMAILS. Go check your “Promotions” collection, if you’re using gmail. Or check your “Spam” collection, if you’re using something else. DRAG AND DROP.
You’re going to get something good, eight days in a row. Get all of them. After a few days, I start the for-real-tutorial sequence in microstates. A few days into that, there’s a slidedeck link. Two days after that, another slidedeck. You’ll want that stuff. Free goodies.
Live free or die, my friend –
AJ Maren
Live free or die: Death is not the worst of evils.
Attr. to Gen. John Stark, American Revolutionary War
References
- Scellier, B., & Bengio, Y. (2017), Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation, Front. Comput. Neurosci., 04 May 2017. doi:10.3389/fncom.2017.00024 online; accessed Sept. 28, 2017.
Some Gopd Backpropagation Tutorials and Discussions
- Yes, You Should Understand Backpropagation, by Andrej Karpathy
- Andrej K.’s lecture on Backpropagation, which is very good and worth 1 hr 10 minutes of your time.
Previous Related Posts
- Deep Learning – the First Layer – a walkthrough of the first part of the backpropagation derivation, written in January, 2017, when I was teaching the new Deep Learning class for the first time. Surprisingly useful!
- Getting Started in Deep Learning – Another surprisingly useful intro/tutorial post – deals with the credit assignment problem.
4 thoughts on “Backpropagation: Not Dead, Not Yet”