Deep Neural Networks Backward module
In this part, we will implement the backward function for the whole network and we will also update the parameters of the model, using gradient descent
L-Model Backward module:
In this part, we will implement the backward function for the whole network. Recall that when we implemented the L_model_forward function, we stored a cache containing a cache at each iteration (X, W, b, and z). In the backpropagation module, we will use those variables to compute the gradients. Therefore, in the L_model_forward function, we will iterate through all the hidden layers backward, starting from layer L. In each step, we will use the cached values for layer l to backpropagate through layer l. The figure below shows the backward pass.
To backpropagate through this network, we know that the output is:
Our code needs to compute:
To do so, we’ll use this formula (derived using calculus which you don’t need to remember):
We can then use this post-activation gradient dAL to keep going backward. As seen in the figure above, we can now feed in dAL into the LINEAR->SIGMOID backward function we implemented (which will use the cached values stored by the
L_model_forward function). After that, we will have to use a for loop to iterate through all the other layers using the LINEAR->RELU backward function. We should store each dA, dW, and db in the grads dictionary. To do so, we'll use this formula:
For example, for l=3 this would store dW[l] in grads[“dW3”].
Code for our linear_backward function:
AL — probability vector, the output of the forward propagation
Y - true "label" vector (containing 0 if non-cat, 1 if cat);
caches - list of caches containing:
1. every cache of
linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2);
2. the cache of
linear_activation_forward() with "sigmoid" (it's caches[L-1]).
grads — A dictionary with the gradients:
grads[“dA” + str(l)] = …
grads[“dW” + str(l)] = …
grads[“db” + str(l)] = …
Update Parameters module:
In this section, we will update the parameters of the model using gradient descent:
here α is the learning rate. After computing the updated parameters, we’ll store them in the parameters dictionary.
Code for our update_parameters function:
parameters — python dictionary containing our parameters.
Grads — python dictionary containing our gradients, output of L_model_backward.
parameters — python dictionary containing our updated parameters:
parameters[“W” + str(l)] = …
parameters[“b” + str(l)] = …
Congrats on implementing all the functions required for building a deep neural network. It was a long tutorial, but from now on, it will only get better. We’ll put all these together to build An L-layer neural network (deep) in the next part. In fact, we’ll use these models to classify cat vs. dog images.
Originally published at https://pylessons.com/Deep-neural-networks-part4