A batch normalization layer dynamically normalizes its inputs depending on the mini-batch statistics. This has several positive effects:
The covariate shift over batches during training will be reduced.
The learning rate can be increased since there is a much smaller risk of getting very large input values.
Overfitting is reduced, since the statistics for each batch will be slightly different, adding a bit of noise to the training. This often leads to that the dropout rate can be lowered.
During training, the batch normalization layer is also keeping and updating population statistics, which is later used in inference time (when using batch statistics might not be suitable).
The operation is implemented as:
Stay in the know by signing up for occasional emails with tips, tricks, deep learning insights, product updates, event news and webinar invitations.
We promise not to spam you or share your email with any third party. You can change your preferences at any time. See our privacy policies.
Please check your email inbox account to confirm, set, or update your communication preferences.