A compromise is to randomly picks a subset of input called minibatch and uses a few of tthem to approximate the descend calculation. The act of randomly picking is called stochastic.
The advantage of this approach is that it reduces the computational demand and at the same time provide a good approximation of the gradient of the loss function which is typically unknown at the beginning. A strong gradient hint can be ignored by early ti avoid falling into local minima
No comments:
Post a Comment