Predict real estate prices
Solve a regression problem using table data and images
If you deploy the final trained AI model from this tutorial in real life, someone could load the location, size of their house, etc., via an online portal and get a valuation. Nice!
Getting a good estimate of the price of a house is hard even for the most seasoned real estate agents. With the advent of deep learning, it is now possible to get a much more sophisticated valuation as we can now use several data types — such as images and table data.
- Target audience: Beginners
You will learn to
- Solve a regression problem. A problem where you predict a quantity, e.g., a price.
- Use multiple datasets, both tabular data and images.
- Run multiple experiments and compare them.
Before following this tutorial, it is strongly recommended that you complete the Deploy an operational AI model if you have not done so already.
Create a project
Start by creating a project on the Projects view by clicking on New project.
Add dataset to the project
In the Datasets view, click the Import free datasets button.
Look for the Cali House - tutorial data dataset in the list. This dataset consists of map images of the blocks from Open street map and tabular demographic data collected from the California 1990 Census.
If you agree with the license, click Accept and import.
This will import the dataset in your project, and you can now edit it.
Each sample in the dataset gives the following information about one block of houses;
Median house age, Total number of rooms, Total number of bedrooms, Population, Number of households, Median income, and Median house value.
In this tutorial, we wish to make an AI model that learns to predict the price of a house, here called medianHouseValue, given the other available data (i.e., median house age, population, etc.).
Hence, medianHouseValue is our target feature, while the others are our input features.
Normalize image feature
You normalize a dataset to make it easier and faster to train a model.
Locate the image_path feature and click the wrench.
Change the Normalization from None to Standardization.
Standardization converts a set of raw input data to have a zero mean and unit standard deviation.
Values above the feature’s mean value will get positive scores, and those below the mean will get a negative score.
The reason we normalize or scale input data is simply that neural networks train better when the data comes roughly in an interval of -1 to 1.
Create a tabular feature set
A feature set is two or more features that you want to treat in the same way during modeling.
This feature set consists of the tabular data on the houses, for example, number of bedrooms and median income.
Click on New feature set, name the feature set 6_features and select the information on the houses:
Create experiment for tabular data
Now that we have the data let’s create the AI model. We’ll start by just trying to predict the prices from the tabular data.
On the Peltarion Platform, an experiment is the basic unit you’ll be working with. It’s the basic hypothesis that you want to try, i.e., “I think I might get good accuracy if I train this model, on this data, in this way.”
An experiment contains all the information needed to reproduce the experiment:
The AI model
The settings or parameters used to run the experiment.
The result is a trained AI model that can be evaluated and deployed.
Click Create an experiment to open the Experiment wizard.
Give the experiment a good name, for example, Tabular data experiment.
Make sure that the Cali House dataset is selected.
Inputs / target tab, make sure:
The Input feature is 6_features (the feature set you created)
The Target feature is medianHouseValue.
Problem type tab
Select Tabular regression in the drop-down menu. This is when a trained model predicts a value or the probability of a target.
Click Create. This will add a complete deep learning model to the Modeling canvas.
Run experiment from modeling canvas
The Experiment wizard has pre-populated all settings needed:
Time to train the model and see if we’ve come up with a good model.
Done! Click Run.
Create experiment with two inputs - tabular and image
Watch the experiment train in the Evaluation view. While the first experiment runs, you can build and run a concurrent experiment to find out if you can improve the experiment.
Navigate to the Modeling view.
Click the Iterate button.
Open the Reuse part of model tab.
Select the best checkpoint and keep Target as Terminal port
|If you’re on the Free plan you can run 1 experiment at a time. All other plans can run concurrent experiments.|
Build a model with multiple inputs
The already trained model will now show up in the Modeling view as a separate block. We call it a user block. The block gets its name from the previous experiment’s name, for example, Tabular data experiment.
Now build a model based on the user block according to this illustration:
Connect an Input block to the user block.
Set Feature to the feature set you created, 6_features.
Add a new Input block.
Set Feature to the image, image path.
Add an EfficientNet B0 block and connect it to the image input.
Expand the Transform heading and add a Concatenate block.
Connect the output from the User block and the Efficientnet block to the input of the Concatenate block.
Add a Dense block.
Set Units to 512.
Set Activation to ReLU.
Add a Dense block.
Set Units to 1.
Set Activation to Linear.
Add a Target block.
Set Feature to medianHouseValue.
Set Loss function to Mean squared error.
Click Run, and move on to compare the two experiments.
The Evaluation view shows in several ways how the training of the model has progressed and how your experiments are performing.
As long as you keep the same loss function, you can compare the results of the experiments and see which one is the best.
Did the second input help?
The lower the loss, the better a model (unless the model has over-fitted to the training data). The loss is calculated on training and validation and its interpretation is how well the model is doing for these two sets. It is a summation of the errors made for each example in training or validation sets.
Prediction scatter plot
Navigate to the Predictions inspection tab and take a look at the scatter plot.
In a perfect scatter plot, you’ll have 100% on the diagonal going from bottom left to top right.
Congratulations, you’ve completed the California house pricing tutorial. In this tutorial, you’ve learned how to:
Solve a regression problem, first by using one input and then by extending the experiment using multiple datasets.
Analyze the experiments to find out which one was the best.
Next tutorial - Sales forecasting with spreadsheet integration
We suggest that the next tutorial you should do is Sales forecasting with spreadsheet integration.
You will learn to:
Build a deep learning model with no code.
Predict sales numbers from spreadsheet data.
Deploy your model for production on the Peltarion Platform.
Integrate your model with Google Sheets or Microsoft Excel.