Find similar Google questions
Use text similarity to find out what similar questions others have asked.
Text similarity is a way to quantify the similarity between two pieces of text, for instance, two questions written in natural language.
- Target audience: Beginners
- Tutorial type: Get started tutorial
- Estimated time: 15 minutes
- Problem type: Text similarity
You will learn to
- Build and deploy a model for text similarity.
Create a project
First, click New project to create a project and name it, so you know what kind of project it is. Naming is important.
Add dataset to the platform
After creating the project, you will be taken to the Datasets view.
Click the Import free datasets button.
Look for the Google Natural Questions - tutorial data dataset in the list.
Click on it to get more information.
The Google Natural Questions dataset
This dataset consists of over 300000 questions submitted by real people. We’ll learn how we can search it for questions that are similar to any new text input.
If you agree with the license, click Accept and import. This will import the dataset in your project, and you will be taken to the dataset’s details where you can edit features and subsets.
Build model with the Experiment wizard
The dataset is automatically set up, so you just have to click Use in new experiment to open up the Experiment wizard.
Make sure that the Google Natural Questions dataset is selected.
Inputs / target tab
Select question as Input and index as Target. (The target won’t be used to train a model but we need it to define a complete model graph.)
Problem type tab
Select Problem type Text similarity.
Finally, click Create and you will find the model in the Modeling view.
The Universal Sentence Encoder block is already pretrained for the purpose of text similarity. As a result, we don’t need to train it. We can directly run the model to create it and move to the next step.
Click Run and wait until the experiment has finished the default 1 epoch.
In the Evaluation view, click Create deployment (you can skip evaluation this time).
Name the deployment and select Similarity search.
Make sure all settings are what you want them to be:
Experiment - The experiment you just ran.
Checkpoint — Epoch: 1 since we didn’t train the model.
Embedding block — Text embedding.
Output features. You will get these features back when you search with a new question. Select:
The platform will begin to index all the questions in the Google Natural Questions dataset. This might take a while.
Click Enable once indexing is finished.
Your experiment is now ready to be called via the deployment API or by deploying your own deployment web app straight from the deployment page.
Test with deployment web app
We’ve made it super easy for you to test the deployment. Click on Open web app, and you will be directed to our Deployment web app.
You just need to type in some text and then click the Get your result button to try it.
The most similar question with the lowest distance will be at the top. And if there is an answer to that question, it will be displayed as well.
You’ve added a dataset to your project and built a model based on the pretrained Universal Sentence Encoder.
You’ve indexed all questions in the Natural Questions dataset with the model.
You’ve deployed your model and tested it to find similar questions that real people have asked.
Get started with text classification
In the Classify text in any language tutorial, we will show you how you can use the Peltarion Platform and its Multilingual BERT to create a model that is able to work with multiple languages simultaneously!
You will learn how to automatically classify text extracts depending on their topic. Mix the available languages for training the model, and test it in any language.
You can get a quick understanding of text similarity search in the blog post Search text by semantic similarity.