Backpropagation

In machine learning, backpropagation is a gradient estimation method commonly used for training neural networks to optimize the weights during training (using an optimizer such as gradient descent).

The method relies on applying the chain rule of calculus to neural networks.

// the cost function is a nested function
float cost   = cost_fn(activation_fn(output));
float output = input1 * weight1 + input2 * weight2 + ... + bias;

// this means that in order to find the derivative, we can use the chain rule
//    h(x) = f(g(x)) =>  h'(x) = f'(g(x))g'(x)

//    f'(x)          = d f(x) / d x
// => cost_fn'       = d cost_fn / d activation_fn
// => activation_fn' = d activation_fn / d output
// => input1         = d output / d weight1

// => cost_fn' = cost_fn'(activation_fn(output)) * activation_fn'(output) * input1