2D Depthwise convolution

The 2D Depthwise convolution block performs 2D convolutions, which can learn to identify spatial patterns in the data.

The input of this block must have 3 dimensions. If an input has only one channel (e.g. a grayscale picture), the size of the third dimension is 1.

The output may have a different size in the first two dimensions, depending on the padding parameter. The size in the third dimension, corresponding to the amount of channels, is preserved.

Comparison with 2D Convolution block

The 2D Depthwise convolution block is very similar to a 2D Convolution block having a single filter.

With both blocks, each channel in the input is convolved with a distinct kernel. The 2D Convolution block then sums the result of all convolutions together, producing a single channel.
In contrast, the 2D Depthwise convolution block outputs the result of each convolution as a separate channel. This allows the 2D Depthwise convolution block to produce more output channels with the same amount of computation.

The 2D Depthwise convolution block is thus more efficient than the 2D Convolution block in situations where the significant input patterns are isolated to single channels.
However, you cannot increase the number of filters, which is fixed to 1 per block.

Usage

The depthwise convolution is often used as a building block in recent fast and high-performing networks for image classification, like EfficientNet and MobileNet.

Parameters

Width: The width of the weight matrix of a single filter. Default: 3

Height: The height of the weight matrix of a single filter. Default: 3

Horizontal stride: The number of pixels to move while performing the convolution along the horizontal axis. Default: 1

Vertical stride: The number of pixels to move while performing the convolution along the vertical axis. Default: 1

Padding: Same results in padding the input such that the output has the same height and width as the original input. Default: Valid means "no padding".

Activation: The function that will applied to each element of the output. Default: ReLu

Use bias: Whether to add a constant before applying the a activation function.

Trainable: Whether we want the training algorithm to change the value of the weights during training. In some cases one will want to keep parts of the network static and train only other parts.