The Swish activation function intends to be a straightforward replacement for the ubiquitous ReLU function.

The Swish function has similar strengths as the ReLU function, to which it adds smoothness and non-monotonic properties. It has been benchmarked on a variety of tasks, and was shown to consistently give better results.

It is a non-linear function that is easily expressed as the product of the identity function with the sigmoid function:


\[f(x) = x \cdot \frac{1}{1+e^{-x}}\]
ReLU graph
Figure 1. Swish graph
Was this page helpful?
Yes No