Gradient Descent Algorithm

Mini-batch, Stochastic & Batch Gradient Descent

There are 3 gradient descent algorithms used for backpropagation in neural networks: stochastic, batch and mini-batch.

    1. L_i: loss function (calculate error)
    1. y_i: neural network output (predicted data)
    1. x_i: real output (train label)

  1. Stochastic gradient descent: descent turn out to be faster than batch, so is widely used. However, it has low accuracy and you need more iterations to get to the minimum.
    Loss function:

        \[ L_i=\frac{1}{2}(y_i-x_i)^2 \]

    The derivative of Loss function:

        \[ \frac{\partial L}{\partial W}=-(y_i-x_i) \]

  2. Batch gradient descent: As we need to calculate the gradient on the whole dataset to perform just one update, batch gradient descent can be very slow and is intractable for datasets that don’t fit in memory.
    Loss function:

        \[ L_i=\frac{1}{2}((y_1-x_1)^2+(y_2-x_2)^2+...+(y_n-x_n)^2) \]

    Derivative of loss function

        \[ \frac{\partial L}{\partial W}=-((y_1-x_1)+(y_2-x_2)+...+(y_n-x_n)) \]

  3. Mini-batch gradient descent: is the most favorable and widely used algorithm that makes precise and faster results using a batch of ‘ m ‘  training examples. In mini batch algorithm rather than using  the complete data set, in every iteration we use a set of ‘m’ training examples called batch to compute the gradient of the cost function.
    Loss function:

        \[ L_i=\frac{1}{2}((y_1-x_1)^2+(y_2-x_2)^2+...+(y_n-x_n)^2) \]

    Derivative of loss function

        \[ \frac{\partial L}{\partial W}=-((y_1-x_1)+(y_2-x_2)+...+(y_n-x_n)) \]

 

8 thoughts on “Gradient Descent Algorithm”

  1. Very nice post. I just stumbled upon your weblog and wanted to say that I’ve really enjoyed surfing around your blog posts. In any case I’ll be subscribing to your rss feed and I hope you write again soon!

Leave a Reply

Your email address will not be published. Required fields are marked *