The dense block represents a fully connected layer of artificial units.

Each of the units (as many as the Units attribute) has one weight per input feature and an output that is a function of its inputs.

2 Dense blocks
Figure 1. Two Dense blocks

Usage of Dense blocks

Last block before the Target block

In a classification problem, you should add a Dense block as the last block before the Target block. Set as many units as the number of classes the model can predict.

Flatten the input to a Dense blocks

If the input has more than one dimension of feature, e.g., an image has 3 (height, width, channels), the data should be flattened before being fed into the units with a Flatten block.

Knowledge stored in the Dense block

The more Dense blocks you add, the more complex function approximation can be done by the net.

Dense blocks are the only blocks with units used in multilayer perceptrons, the simplest form of deep neural networks.

Dense blocks with many channels grow quickly in memory usage

A Dense block, where all units are fully connected, and each feature is treated separately, grow quickly in memory usage with the input size. This means that Dense blocks aren’t suitable for data like images where the input is height x width x number of channels, which for a standard definition image is at least 1 million.
Convolutional blocks, e.g., the 2D Convolution block, take better advantage of the structure within image data and need less memory to work.

Activation in the last Dense block


The activation function of the last Dense block should reflect the range in values of your target:

  • If your target ranges from -∞ to +∞, use Linear

  • If your target ranges from 0 to +∞, use ReLU

  • If your target ranges from 0 to 1, use Sigmoid

  • If your target ranges from -1 to 1, use TanH


This activation function depends on the type of classification task:

  • If you are trying to predict which one of multiple labels is correct, use Softmax

  • If you are trying to predict multiple, independent labels, use Sigmoid


Units: the number of units in the block.

Initializer: the procedure used to set the initial values of the weights and the bias before starting training.
Default: Glorot uniform initialization

Activation: the function used to transform the output of the dot product inside the block.
Default: ReLU

Trainable: whether we want the training algorithm to change the value of the weights during training. In some cases, one will want to keep parts of the network static, e.g., when using the encoder part of an autoencoder as preprocessing for another model.

Was this page helpful?