Encoding  Datatype  Applicable normalization 

Numeric 
Integer 

Image 
Image 

String 
 
Standardization converts a set of raw input data to have a zero mean and unit standard deviation. Values above the feature’s mean value will get positive scores, and those below the mean will get a negative score.
When you perform a standardization, you assume that the input data are normally distributed. After the normalization, the data will be Gaussian with mean zero and standard deviation one. It is possible to have values that are very far away from zero (e.g., 5, 10, 20), but if the distribution is unit Gaussian, those values will have very small probabilities.
Standard score is also called zscore, see Wikipedia.
Why use standardization
You standardize a dataset to make it easier and faster to train a model. Standardization normalization usually helps with getting the input data into a value range that works well with the default activation functions, weight initializations, and other Platform parameters.
Standardization puts the input values on a more equal footing so that there is less risk of one input feature drowning out the others.
Formula for standardization
The standard score of a raw input data value x is calculated:
μ is the mean of the values of the feature in question.
σ is the standard deviation of the values of the feature in question.
Minmax normalization transforms input data to lie in a range between 0 and 1. After normalization, the lowest value of the input feature will be converted to 0, and the highest will get the value 1.
Why use minmax normalization
You normalize a dataset to make it easier and faster to train a model.
Minmax normalization puts values for different input features on a more equal footing. This will in many cases decrease the training time.
Minmax scaling helps with getting the input data into a value range that works well with the default activation functions, weight initializations, and other platform parameters.
If you have large outliers, you should be careful about applying minmax normalization, as this could put the nonoutlier values in an overly narrow range.
Formula for minmax normalization
The normalized score of a raw input data value x is calculated:
where:
min(x) is the lowest value for input feature x.
max(x) is the highest input value for input feature x.
Selecting None means that your data will not be modified in any way.
Why use No preprocessing
Use None when you have numeric data that is on an appropriate scale.
If you have categorical or text data, you cannot use None. You will always have to preprocess categorical or text data as numeric data. If you have numeric data, you will often want to transform them anyway, for example with standardization, to facilitate model training.
Example: You are trying to forecast the price of a stock and the input data features consist of daily relative changes in the stock’s price. Then you can train a model with no preprocessed data.
Categorical is used when you don’t want to impose a specific ordering on your data.
Categorical can be used both on input and target features.
How does categorical encoding work
Categorical is the same thing as onehot encoding, it takes the categorical features in a dataset and converts them into new features. These features are binary vectors where only one entry is 1 while the rest is 0 (hence onehot).
Why use categorical encoding
In deep learning, you need to convert your data to a numeric format to be able to train your models. Categorical encoding is used when you don’t want to impose a specific ordering on the categorical data.
Example: You have a dataset with five categories of clothes, "Tshirt", "Trouser", "Bag, "Hat", and "Ankle boot". If you select categorical this dataset you will not impose a specific ordering on the categories. If you code them as integers 1 to 5, you will treat "Ankle boot (5)" more similar to "Hat (4)" than "Tshirt (1)".
The drawback of categorical encoding is that it can generate a very large number of new features for input features that have a large number of possible values (have many unique values.) In these cases, it may be better to use an embedding layer to decrease the number of dimensions.
We use cookies to offer you a better experience, analyze site traffic and serve targeted advertisements. By using this site, you agree to this use. To find out more, read our Cookie Policy.