Normalization allows you to rescale the imported data before using it to train a model.
When a feature encoding is compatible, you can select which type of normalization you want to apply.
Standardization converts a set of raw input data to have a zero mean and unit standard deviation. Values above the feature’s mean value will get positive scores, and those below the mean will get a negative score.
When you perform a standardization, you assume that the input data are normally distributed. After the normalization, the data will be Gaussian with mean zero and standard deviation one. It is possible to have values that are very far away from zero (e.g., -5, -10, 20), but if the distribution is unit Gaussian, those values will have very small probabilities.
Standard score is also called z-score, see Wikipedia.
Why use standardization
You standardize a dataset to make it easier and faster to train a model. Standardization normalization usually helps with getting the input data into a value range that works well with the default activation functions, weight initializations, and other Platform parameters.
Standardization puts the input values on a more equal footing so that there is less risk of one input feature drowning out the others.
Formula for standardization
The standard score of a raw input data value x is calculated:
μ is the mean of the values of the feature in question.
σ is the standard deviation of the values of the feature in question.
Min-max normalization transforms input data to lie in a range between 0 and 1. After normalization, the lowest value of the input feature will be converted to 0, and the highest will get the value 1.
Why use min-max normalization
You normalize a dataset to make it easier and faster to train a model.
Min-max normalization puts values for different input features on a more equal footing. This will in many cases decrease the training time.
Min-max scaling helps with getting the input data into a value range that works well with the default activation functions, weight initializations, and other platform parameters.
If you have large outliers, you should be careful about applying min-max normalization, as this could put the non-outlier values in an overly narrow range.
Formula for min-max normalization
The normalized score of a raw input data value x is calculated:
min(x) is the lowest value for input feature x.
max(x) is the highest input value for input feature x.
Selecting None means that your data will not be modified in any way.
Why use No preprocessing
Use None when you have numeric data that is on an appropriate scale.
If you have categorical or text data, you cannot use None. You will always have to preprocess categorical or text data as numeric data. If you have numeric data, you will often want to transform them anyway, for example with standardization, to facilitate model training.
Example: You are trying to forecast the price of a stock and the input data features consist of daily relative changes in the stock’s price. Then you can train a model with no preprocessed data.