# Scatter plot

The scatter plot appears when an experiment solves a regression problem, i.e., the prediction of numeric values.

The horizontal axis represents the actual value of an example, as defined by its target feature.

The vertical axis represents the value predicted by the model.

Individual examples are marked with dots, although no more than 250 dots are plotted at a time to avoid degrading readability.

The color levels indicate the density of examples in different areas of the plot, and account for all examples of the inspected subset.

If a model is good, most of the examples should be concentrated near the Line of identity, which corresponds to identical actual and predicted values.

## Interact with the scatter plot

You can use the select tool to draw a rectangle window on the scatter plot. This will filter the inspection table to show only the examples contained in this window.

X & Y Axes: Linear scale is the default scale.

Log scale applies a logarithmic scale to both the Actual and Predicted axes.
This is useful when your tolerance on errors is proportional the actual values.

## How to improve regression results

Many examples, and the density levels, spreading **away from the Line of identity** indicate that the model has difficulties making accurate predictions.

You may try to use a larger model with more parameters.

If the plotted examples and density levels look limited to a narrow **horizontal band**, you may have used the wrong activation function on the last block before the target of your model, or you may need to normalize your dataset.

For instance, the ReLU and sigmoid activation functions cannot predict negative values; TanH cannot predict values outside `-1`

and `1`

. Sometimes, you may have to use the linear activation function just before the target block, and use non-linear activation functions only in the previous layers.

## Multi-dimensional target

For a multi-dimensional target, each value in the scatter plot corresponds to an element of the output tensor. This means that the total number of values in the confusion matrix may be higher than the number of examples in the dataset.