2D Max pooling block

This layer reduces the size of the data, the number of parameters, the amount of computation needed, and it also controls overfitting. You can say that 2D max pooling is similar to scaling down the size of an image.

The 2D Max pooling block represents a max pooling operation. This layer outputs a smaller tensor than its input, which means downstream layers will need fewer parameters and amount of computation; it also serves to control overfitting.

2D Max pooling block
Figure 1. 2D Max pooling block

The 2D max pooling block moves a rectangle (window) over the incoming data, computing the maximum in each specific window. The size of the window is determined by the horizontal and vertical pooling factor and how big steps the window takes is determined by the horizontal and vertical stride.

Max pooling layers are inserted after one or more convolutional layers; they help inner convolutional layers receive information from a bigger portion of the original image (a 3x3 filter after pooling is influenced by a 6x6 portion of the tensor before pooling). If we see convolutional layers as detectors of a specific feature, max pooling keeps only the “strongest” value of that feature inside the pooling rectangle. Each channel (hence each feature) is treated separately.

The output size is (input width/horizontal pooling factor) x (input height/vertical pooling factor) x (input channels)

Parameters

Horizontal pooling factor: The width of the rectangle within which the maximum is computed.

Vertical pooling factor: The height of the rectangle within which the maximum is computed.

Horizontal stride: Horizontal distance between the left edge of consecutive pooling windows.

Vertical stride: Vertical distance between the top edge of consecutive pooling windows.

Padding: Same (output (height) x (width) is the same as the input) or valid (output (height) x (width) is smaller than the input)