New to Peltarion? Discover our deep learning Platform
A single deep learning platform to build and deploy your projects, even if you’re not an AI superstar.FIND OUT MORE
Functions as a stochastic gradient descent-method. It has a float known as the Adadelta decay factor (p).
For the Adadelta optimizer, you can adjust the three parameters below. Our advice is to leave the parameters at default.
The learning rate is controlling the size of the update steps along the gradient. This parameter sets how much of the gradient you update with, where 1 = 100% but normally you set much smaller learning rate, e.g., 0.001.
In our rolling ball analogy, we’re calculating where the ball should roll next in discrete steps (not continuous). How long these discrete steps are is the learning rate.
Choosing a good learning rate is important when training a neural network. If the ball rolls carefully with a small learning rate we can expect to make consistent but very small progress (this corresponds to having a small learning rate). The risk though is that the ball gets stuck in a local minima not reaching the global minima.
We could also choose to take long confident discrete steps in an attempt to descend faster and avoid local minima, but this may not pay off. At some point, calculating too seldom gives a higher loss as we “overstep”, we overshoot the minima.
Learning rate decay
The value here defines the process of gradually decreasing the learning rate during training, in order to help speed up its steps along the gradient.
This setting is for optimizer RMSprop and Adadelta. For further understanding, see Momentum.