New! The YouTube Vid Series: Backpropagation and More
If you are branding yourself as an AI/neural networks/deep learning person, how well do you really know the backpropagation derivation?
That is, could you work through that derivation, on your own, without having to find a tutorial on it?
If not – you’re in good company. MOST people haven’t worked through that derivation – and for good reason. MOST people don’t remember their chain rule methods from first semester calculus.
(It probably doesn’t help if I say that the backprop derivation is the most basic, the most fundamental of the things that we really should know if we’re doing neural networks … and everything after that is a whole lot more complicated … but we’ll get to that later.)
Why It’s Just Plain Harder Now
Back in the old days, when we taught live classes, and we stood in front of a whiteboard (or for those really old days, a chalkboard), and actually derived equations …
There’s just something about that combined audio-visual stimulus, especially watching the derivation sequence flow (and not just looking at static slides) that works so much better for our learning.
Well, those days are gone for good.
Now, everything is online. Yes, there are YouTubes a-plenty. But not all of them have the pace, the detail, or the smooth flow that we really need.
The Backstory:
I used to be able to teach the backpropagation derivation in two, maybe three, hard-hitting lectures. No nonsense, no preamble, just dive into the equations and I (and metaphorically, we) would derive our way through an absolute truckload of partial differential equations.
At the end, if I did it right (with student corrections helping immensely; I always lost track of plus and minus signs), we’d have the full derivation for two key equations for the backpropagation learning method:
- Hidden-to-output node connection weight changes, and
- Input-to-hidden connection weight changes.
With those two equations in hand, the next exercise for students was typically to write code that translated the equations into a useful program. This was a neural network classification program, using the backpropagation learning method.
However, once I started teaching online (with Northwestern University’s – or NU’s – Master of Science in Data Science program; the MSDS), I never seemed to be able to do the backprop derivation during an online Synch session.
Part of the problem was losing that fluidity of writing down the equations, in their proper order.
But even if I’d managed to set up with a camera and whiteboard, it still wouldn’t have been as smooth. Teaching a method such as this – with lots of detailed mathematical steps – works best when students can give immediate feedback, and when I can pause and answer a question RIGHT AWAY. And that just doesn’t communicate quite so well to online classes. Lots of nuances are lost, and students are more hesitant to break in when the professor is in full flow.
As a result, over the past two years (since developing NU’s MSDS initial curriculum for the AI & Deep Learning course), I’ve never felt that I had a really effective delivery for the backpropagation method.
Enter YouTube videos.
This year, I started making good on my long-term promise to myself – that I would start creating YouTube vids.
The New Story: YouTube Vid Creations, Playlists, and Series
Bluntly, it takes a lot more work to create a good YouTube vid than it does to write a good blogpost. It’s more work even compared to writing a post with inserted pictures and captions. Even more than writing with pictures, captions, and a very nice, solid reference list at the end.
There were also a lot of simple, straightforward technical things that I had to learn; things about lighting, sound, stage set-up (remove anything white or distracting, get as much distance as possible between self and camera, and self and the far wall, … ). Just a lot of technical … details.
Creating a decent YouTube vid required mastering a lot of new technical skills!
These included shoot-specific details, such as learning to turn on the mike before recording a half-hour’s worth of voice-overs. (Yes, this means paying attention to the little red light on the mike. Beginner’s mistakes. Bummer!)
So it’s been a year of learning, and gradual improvements, and this is why you haven’t heard from me in a good long while.
But slowly, and with some embarrassing mistakes, I’ve been creating YouTube vids that meet student needs. So much so that we now have some actual YouTube playlists; each consisting of a topically-specific series.
The current series is on backpropagation.
Why?
(Because, Lord knows, there are already WAY too many backprop tutorials out there … and do we really need another?)
Three good reasons:
- Notation is important. Notational consistency even more so. As we move into the more mathematically challenging topics, we need a consistent notational frame of reference – and that framework starts with backpropagation.
- Contrast-and-compare. With consistent notation, we’ll have a much easier job doing contrast-and-compare between backpropagation and the Boltzmann machine, and then building our understanding of deep learning.
- Warm-up exercises. The backpropagation method is the basis; the reference point. It’s the thing that most of us (with some calculus background) can understand easily. The next set of algorithms, beginning with the Boltzmann machine, get much harder. So here, we start building up our mathematical dexterity muscles once again.
So … the vid that I’ve just released is Number 4 in the series; the Transfer Function and Its Derivative.
If you’re new to neural networks, and trying to get a solid handle on the basics, check them out and see if they work for you!
YouTube Vid 4 in the Backpropagation Series: The Transfer Function and Its Derivative
Here’s the link to the most recent vid in the Backpropagation Series. It’s Vid 4: The Transfer Function and Its Derivative – focusing just on the simple sigmoid function. (But with the full derivative calculation, so simplicity is good.)
Look in the NEXT blogpost for some homework ideas. (For example, now that you’ll have seen the derivative worked out for the sigmoid (logistic) transfer function, you can try your hand at the hyperbolic tangent version.)
Also – on the editorial calendar – we’ll have a blogpost with a nice list of all the good, classic references on backpropagation – including those that I’ve cited in the various YouTubes in the Backpropagation Series, as well as a few more that you’ll find useful.
All carefully formatted in Chicago style, of course!
Also, once we wrap up the Backpropagation Series, we’ll move on to some of the more physics-based AI/neural networks, starting with the Boltzmann machine. This will give us a good opportunity to do a contrast-and-compare for the two major methods (discriminative vs. generative) in neural networks / deep learning.
Leave comments. What do you need; what sorts of vids would help you next?
Thank you!
Live free or die, my friend –
AJ Maren
Live free or die: Death is not the worst of evils.
Attr. to Gen. John Stark, American Revolutionary War
Related YouTube Playlist (The Backpropagation Series)
The YouTube vid referenced here is Number 4 (#4) in the YouTube series on Backpropagation. To view the whole series, please start with Backpropagation (Part 0): Rationale – Why We Should Learn Backpropagation.
Previous Related Posts
Note: In keeping with the Northwestern University MSDS program’s transition to Chicago style for manuscript and reference formatting, all of these blogposts are now formatted in Chicago style.
Small caveat: my annotations (what is in the blogpost – why to read) are not part of the usual Chicago style. They are inserted for your benefit.
- Maren, Alianna J. “Selecting a Neural Network Transfer Function: Classic vs. Current.” www.aliannajmaren.com, Oct. 4, 2017. https://www.aliannajmaren.com/2017/10/04/selecting-a-neural-network-transfer-function-classic-vs-current/. (First post on the transfer function, lots of good links!)
- Maren, Alianna J. “Backpropagation: Not Dead, Not Yet,” Sept. 28, 2017. http://www.aliannajmaren.com/2017/09/28/backpropagation-not-dead-not-yet/. (Why backpropagation is important – not just for basic and deep networks, but also for understanding new learning algorithms.)
- Maren, Alianna J. “Deep Learning: The First Layer,” Jan. 5, 2017. http://www.aliannajmaren.com/2017/01/05/deep-learning-the-first-layer/. (An in-depth walk-through / talk-through the sigmoid transfer function; how it works, how it influences the weight changes in the backpropagation algorithm; similar arguments would apply to the tanh function.)
- Maren, Alianna J. “Getting Started in Deep Learning,” Dec. 26, 2016. http://www.aliannajmaren.com/2016/12/26/getting-started-in-deep-learning/. (A very nice introductory discussion of (semi-)deep architectures and the credit assignment problem; backpropagation-oriented.)