New to Peltarion? Discover our deep learning Platform
A single deep learning platform to build and deploy your projects, even if you’re not an AI superstar.FIND OUT MORE
Learning rate schedule
Learning rate schedule lets you vary the learning rate as the training progresses.
The learning rate is controlling the size of the update steps along the gradient. This parameter sets how much of the gradient you update with, a
Larger learning rate let the model converge faster but may overstep the optimal point
Smaller learning rate is more receptive to the loss function but may require more epochs to converge and may get stuck in local minima.
The Learning rate schedule allows you to start training with larger (or smaller) steps and change the learning rate, according to a schedule.
Learning rate schedule can give better training performance
There is no go-to schedule for all models. Changing the learning rate, in general, has shown to make training less sensitive to the learning rate value you pick for it. So using a learning rate schedule can give better training performance and make the model converge faster.
How to use learning rate scheduling
It’s almost always a good idea to use a schedule. For most models, try the exponential decay schedule first.
The best learning rate depends on the dataset and task that your model is learning. In many cases, it also depends on how close your model already is to the optimal solution.
The Learning rate schedule helps you with both aspects, since it sweeps over a range of learning rates, getting smaller as the model is expected to reach its optimal solution.
Possible to use a higher max learning rate
As a rule of thumb, compared to training without a schedule, you can use a slightly higher maximum learning rate. Since the learning rate changes over time, the whole training is not so sensitive to the value picked.
For most models, it makes sense to try out the exponential decay schedule first.
The exponential schedule divides the learning rate by the same factor (%) every epoch. This means that the learning rate will decrease rapidly in the first few epochs, and spend more epochs with a lower value, but never reach exactly zero.
Decay per epoch (%): Factor to decrease the Learning rate. Default: 5
The linear schedule decreases the learning rate by the same amount (decrement) every epoch. Depending on the Decrement per epoch, the learning rate can reach zero quite fast, so set the value depending on the Learning rate.
Decrement per epoch: Decrease of learning rate per epoch. Update depending on the set Learning rate Default: 0.0001.
The triangle schedule consists of two parts:
The first is a linear learning rate increase during Warm-up epochs proportion no of epochs, from 0 up the set Learning rate.
The second part is a linear decay that decreases the learning rate by the same Decrement per epoch.
Triangle decay is recommended for text classification using BERT fine-tuning, but can also be applied to all other kinds of models.
Warm-up epochs proportion: Length in epochs of learning rate increase from 0 to 1. Default: 1
Decrement per epoch: Decrease of learning rate per epoch. Update depending on the set Learning rate. Default: 0.0001