Test it on the Peltarion Platform

A platform to build and deploy deep learning projects.

Even if you’re not an AI superstar.

TRY FOR FREE

Categorical encoding

Categorical is used when you don’t want to impose a specific ordering on your data.

Categorical can be used both on input and target features.

How does categorical encoding work

Categorical is the same thing as one-hot encoding. It takes the categorical features in a dataset and converts them into new features. These features are binary vectors where only one entry is 1 while the rest is 0 (hence one-hot).

Categorical encoding

Why use categorical encoding

In deep learning, you need to convert your data to a numeric format to be able to train your models. Categorical encoding is used when you don’t want to impose a specific ordering on the categorical data.

Example: You have a dataset with five categories of clothes, "T-shirt", "Trouser", "Bag, "Hat", and "Ankle boot". If you select categorical this dataset you will not impose a specific ordering on the categories. If you code them as integers 1 to 5, you will treat "Ankle boot (5)" more similar to "Hat (4)" than "T-shirt (1)".

The drawback of categorical encoding is that it can generate a very large number of new features for input features that have a large number of possible values (have many unique values.) In these cases, it may be better to use an embedding layer to decrease the number of dimensions.