Predict real estate prices
Solve a regression problem using two types of data; table and images
Getting a good estimate of the price of a house is hard even for the most seasoned real estate agents. With the advent of deep learning, it is now possible to get a much more sophisticated valuation as we can now use several data types.
- Target audience: Intermediate users
You will learn to
- Use multiple datasets, both tabular data, and images.
- Run multiple experiments and compare them.
- Solve a regression problem. A problem where you predict a quantity, e.g., a price.
Create a project
Start by creating a project on the Projects view by clicking on New project.
Add dataset to the project
In the Datasets view, click the Import free datasets button.
Look for the Cali House - tutorial data dataset in the list. This dataset consists of map images of the blocks from Open street map and tabular demographic data collected from the California 1990 Census.
If you agree with the license, click Accept and import.
This will import the dataset in your project, and you can now edit it.
Cali House - tutorial data
Each sample in the Cali House dataset gives the following information about one block of houses;
Median house age, Total number of rooms, Total number of bedrooms, Population, Number of households, Median income, and Median house value.
Target - medianHouseValue
In this tutorial, we wish to make an AI model that learns to predict the price of a house, here called medianHouseValue, given the other available data (i.e., median house age, population, etc.).
Hence, medianHouseValue is our target feature, while the others are our input features.
Change image normalization to Standardization
You normalize a dataset to make it easier and faster to train a model.
Locate the image_path feature and click the wrench .
Change the Normalization from None to Standardization.
What is standardization?
Standardization converts a set of raw input data to have a zero mean and unit standard deviation. Values above the feature’s mean value will get positive scores, and those below the mean will get a negative score.
The reason we normalize or scale input data is simply that neural networks train better when the data comes roughly in an interval of -1 to 1.
Create a tabular feature set
A feature set is two or more features that you want to treat in the same way during modeling.
This feature set consists of the tabular data on the houses, for example, number of bedrooms and median income.
Click Show advanced settings.
Click on New feature set, name the feature set 6_features and select the information on the houses:
Create 1st experiment - only tabular data
Now that we have the data let’s create the AI model. We’ll start by just trying to predict the prices from the tabular data.
Click Save version and then Use in new experiment to open the Experiment wizard.
The goal of the Experiment wizard is to make it easy to create a ready-to-run experiment. The wizard takes advantage of the available info, provides a good starting point, and makes sure the user gets a good result with almost no effort.
Give the experiment a good name, for example, Tabular data experiment.
Make sure that the Cali House dataset is selected.
Inputs / target tab, make sure:
The Input feature is 6_features (the feature set you created)
The Target feature is medianHouseValue.
Problem type tab
Select Tabular regression in the drop-down menu. This is when a trained model predicts a value or the probability of a target.
Click Create. This will add a complete deep learning model to the Modeling canvas.
Run experiment from modeling canvas
The Experiment wizard has pre-populated all experiment settings. You don’t need to change anything, but for your information:
Time to train the model and see if we’ve come up with a good model.
Done! Click Run.
Create 2nd experiment - tabular and image
While the first experiment runs, you can build and run a concurrent experiment to find out if you can improve the experiment.
Navigate to the Modeling view.
Click the Iterate button.
Open the Reuse part of model tab.
Select the best checkpoint and keep Target as Terminal port
|If you’re on the Free plan you can run 1 experiment at a time. All other plans can run concurrent experiments.|
The already trained model will now show up in the Modeling view as a separate block. We call it a user block. The block gets its name from the previous experiment’s name, for example, Tabular data experiment.
Now build a model based on the user block according to this illustration:
Open the Build tab in the Modeling view.
Connect an Input block to the user block.
Set Feature to the tabular feature set you created, 6_features.
Add a new Input block.
Set Feature to the image, image path.
Expand the Transform heading and add a Concatenate block.
Connect the output from the User block and the Efficientnet block to the input of the Concatenate block.
Add a Dense block.
Set Units to 512.
Set Activation to ReLU.
Add a Dense block.
Set Units to 1.
Set Activation to Linear.
Add a Target block.
Set Feature to medianHouseValue.
Set Loss function to Mean squared error.
Open the Settings tab.
Set the Batch size to 32.
Click Run, and move on to compare the two experiments.
The Evaluation view shows in several ways how the training of the model has progressed and how your experiments are performing.
Same loss function
As long as you keep the same loss function, you can compare the results of the experiments and see which one is the best.
Did the second input help?
The lower the loss, the better a model (unless the model has overfitted to the training data). The loss is calculated on training and validation and its interpretation is how well the model is doing for these two sets. It is a summation of the errors made for each example in training or validation sets.
Prediction scatter plot
Navigate to the Predictions inspection tab and take a look at the scatter plot.
In a perfect scatter plot, you’ll have 100% on the diagonal going from bottom left to top right.
Congratulations, you’ve completed the California house pricing tutorial. In this tutorial, you’ve learned how to:
Solve a regression problem, first by using one input and then by extending the experiment using multiple datasets.
Analyze the experiments to find out which one was the best.
Next tutorial - Sales forecasting with spreadsheet integration
We suggest that the next tutorial you should do is Sales forecasting with spreadsheet integration.
You will learn to:
Build a deep learning model with no code.
Predict sales numbers from spreadsheet data.
Deploy your model for production on the Peltarion Platform.
Integrate your model with Google Sheets or Microsoft Excel.