Chapter 2: Parameter Learning

Gradient Descent is used for minimizing the cost function and it is not only used in regression but also in other different types of machine learning techniques.
The derivative calculation on the Gradient descent algorithm is in charge of deciding what is the next value of the parameters to try to head towards the minimum. The derivative value is basically deciding in which direction we are going on the Cost function curve so as to decide what are the next values of the parameters. If the derivative produces a positive slope then direction to choose parameter value is towards the negative side, meaning a lower value of a parameter. Conversely, if the derivative has a negative slope, then the direction is to the right hand side, meaning a higher value of the parameter.
In the gradient descent algorithm if the alpha is too small, then the process will take baby steps towards the minimum, resulting in a slow algorithm. Conversely, if the alpha is big then gradient descent can overshoot the minimum (it may fail to converge or even diverge).
Gradient descent for linear regression will have a bow shape and only has a global optima.
Batch gradient descent if when each step of gradient descent uses all the training examples.