# Predict real estate prices

Solve a regression problem using two types of data; table and images

Getting a good estimate of the price of a house is hard even for the most seasoned real estate agents. With the advent of deep learning, it is now possible to get a much more sophisticated valuation as we can now use several data types.

- Target audience: Intermediate users

You will learn to
- Use multiple datasets, both tabular data, and images.
- Run multiple experiments and compare them.
- Solve a regression problem. A problem where you predict a quantity, e.g., a price.

## Create a project

Start by creating a project on the Projects view by clicking on New project.

## Add dataset to the project

In the Datasets view, click the Import free datasets button.

Look for the Cali House - tutorial data dataset in the list. This dataset consists of map images of the blocks from Open street map and tabular demographic data collected from the California 1990 Census.

If you agree with the license, click Accept and import.

This will import the dataset in your project, and you can now edit it.

### Cali House - tutorial data

Each sample in the Cali House dataset gives the following information about one block of houses;
Median house age, Total number of rooms, Total number of bedrooms, Population, Number of households, Median income, and Median house value.

Target - medianHouseValue
In this tutorial, we wish to make an AI model that learns to predict the price of a house, here called medianHouseValue, given the other available data (i.e., median house age, population, etc.).

Hence, medianHouseValue is our target feature, while the others are our input features.

#### Change image normalization to Standardization

You normalize a dataset to make it easier and faster to train a model.

Locate the image_path feature and click the wrench .

Change the Normalization from None to Standardization.

##### What is standardization?

Standardization converts a set of raw input data to have a zero mean and unit standard deviation. Values above the feature’s mean value will get positive scores, and those below the mean will get a negative score.
The reason we normalize or scale input data is simply that neural networks train better when the data comes roughly in an interval of -1 to 1.

#### Create a tabular feature set

A feature set is two or more features that you want to treat in the same way during modeling.

This feature set consists of the tabular data on the houses, for example, number of bedrooms and median income.

2. Click on New feature set, name the feature set 6_features and select the information on the houses:

• housingMedianAge (1)

• totalRooms (1)

• totalBedrooms (1)

• population (1)

• households (1)

• medianIncome (1)

3. Click Create.

## Create 1st experiment - only tabular data

Now that we have the data let’s create the AI model. We’ll start by just trying to predict the prices from the tabular data.

### Experiment wizard

Click Save version and then Use in new experiment to open the Experiment wizard.

The goal of the Experiment wizard is to make it easy to create a ready-to-run experiment. The wizard takes advantage of the available info, provides a good starting point, and makes sure the user gets a good result with almost no effort.

• Give the experiment a good name, for example, Tabular data experiment.

• Dataset tab
Make sure that the Cali House dataset is selected.

• Inputs / target tab, make sure:

• The Input feature is 6_features (the feature set you created)

• The Target feature is medianHouseValue.

• Problem type tab
Select Tabular regression in the drop-down menu. This is when a trained model predicts a value or the probability of a target.

• Click Create. This will add a complete deep learning model to the Modeling canvas.

## Run experiment from modeling canvas

The Experiment wizard has pre-populated all experiment settings. You don’t need to change anything, but for your information:

• The Loss in the Target block is set to Mean Squared Error (MSE). MSE is often used when doing regression, when the target, conditioned on the input, is normally distributed.

• The last Dense block has Units set to 1 because we want only one prediction.

Time to train the model and see if we’ve come up with a good model.

Done! Click Run.

## Create 2nd experiment - tabular and image

While the first experiment runs, you can build and run a concurrent experiment to find out if you can improve the experiment.

1. Navigate to the Modeling view.

2. Click the Iterate button.

3. Open the Reuse part of model tab.
Select the best checkpoint and keep Target as Terminal port

 Note If you’re on the Free plan you can run 1 experiment at a time. All other plans can run concurrent experiments.

The already trained model will now show up in the Modeling view as a separate block. We call it a user block. The block gets its name from the previous experiment’s name, for example, Tabular data experiment.

Now build a model based on the user block according to this illustration:

1. Connect an Input block to the user block.
Set Feature to the tabular feature set you created, 6_features.

2. Add a new Input block.
Set Feature to the image, image path.

3. Add an EfficientNet B0 block and connect it to the image input.
EfficientNet has been designed by an optimization procedure that maximizes the accuracy for a given computational cost.

5. Connect the output from the User block and the Efficientnet block to the input of the Concatenate block.

Set Units to 512.
Set Activation to ReLU.

Set Units to 1.
Set Activation to Linear.

Set Feature to medianHouseValue.
Set Loss function to Mean squared error.

9. Done!!

Click Run, and move on to compare the two experiments.

## Analyze experiments

The Evaluation view shows in several ways how the training of the model has progressed and how your experiments are performing.

Same loss function

As long as you keep the same loss function, you can compare the results of the experiments and see which one is the best.

Did the second input help?

### Loss graph

The lower the loss, the better a model (unless the model has overfitted to the training data). The loss is calculated on training and validation and its interpretation is how well the model is doing for these two sets. It is a summation of the errors made for each example in training or validation sets.

Figure 1. Loss graph

### Prediction scatter plot

Navigate to the Predictions inspection tab and take a look at the scatter plot.

In a perfect scatter plot, you’ll have 100% on the diagonal going from bottom left to top right.

Figure 2. Prediction scatterplot

## Tutorial recap

Congratulations, you’ve completed the California house pricing tutorial. In this tutorial, you’ve learned how to:

• Solve a regression problem, first by using one input and then by extending the experiment using multiple datasets.

• Analyze the experiments to find out which one was the best.

Good job!

### Next tutorial - Sales forecasting with spreadsheet integration

We suggest that the next tutorial you should do is Sales forecasting with spreadsheet integration.

You will learn to:

• Build a deep learning model with no code.

• Predict sales numbers from spreadsheet data.

• Deploy your model for production on the Peltarion Platform.