Evaluation view - with and without standardization / Example workflow

Let’s compare the same model performance based on different versions of datasets - with and without normalization. The dataset Predict California house prices will be used in this workflow(here). Read the example workflow about preparing different versions of this dataset on our create different versions of a dataset page and models design for image and tabular data on our pages model design for image data and model design for tabular data.

Step 1: Document experiment results

Report the loss value of each experiment and document them in the table. Note that this example workflow is mainly for comparing results with different dataset settings, not focusing on model optimization.

Experiment names

Dataset version

Loss (MSE) AVG
of the 3 experiments

NoStdImage/TargetStd
NoStdImage/TargetStd 2
NoStdImage/TargetStd 3

NoStdImage/TargetStd

0.958

StdImage/TargetStd
StdImage/TargetStd 2
StdImage/TargetStd 3

StdImage/TargetStd

0.936

StdTabular/TargetStd
StdTabular/TargetStd 2
StdTabular/TargetStd 3

StdTabular/TargetStd

0.320

NoStdTabular/TargetStd
NoStdTabular/TargetStd 2
NoStdTabular/TargetStd 3

NoStdTabular/TargetStd

0.430

NoStdTabular/NoTargetStd
NoStdTabular/NoTargetStd 2
NoStdTabular/NoTargetStd 3

NoStdTabular/NoTargetStd

26552888100

Step 2: Compare results

Compare image experiments

Compare experiments on the dataset versions NoStdImage/TargetStd and StdImage/TargetStd, the average loss of the 3 experiments with image standardization outperforms the ones without image standardization, by a margin 0.022.

Compare tabular experiments

Compare experiments on the dataset versions StdTabular/TargetStd and NoStdTabular/TargetStd, The average loss of the 3 experiments with tabular data standardization outperforms the ones without standardization, by a margin 0.110.

The loss of the experiment with target non-standardized is 26552888100, whereas the loss of the experiment with target standardized is 0.430. The reason that the standardization makes the loss (mean squared error) different is that, the Target_medianhouseValue range is 14999 to 500001, without standardizing the target, the absolute deviations in the predictions could be large. For example, if the model predicts the target to be 14999, but the ground truth of the target is 500000, then, the squared error is 1225000000. On a standardized target, an equally bad prediction could be that predicting -1 when the ground truth of the target is 1. In that case, the squared error is only 4.

Conclusion:

The performance is improved after standardizing the input features. Next, Deploy your model.