Find similar Google questions
Use text similarity to find out what similar questions others have asked.
Autocomplete search queries
Suggest answered similar questions
Identify most common topics
- Target audience: Beginners
- Estimated time: 15 minutes
You will learn to
- Build and deploy a model for text similarity
- Use the output from a specific block in the model
Why text similarity search?
Text similarity is a way to quantify the similarity between two pieces of text, for instance, two questions written in natural language.
Similarity search is faster than direct text string comparison, and allows some flexibility in the results.
Create a project
First, click New project to create a project and name it, so you know what kind of project it is. Naming is important.
Add the Google Natural Questions dataset to the platform
After creating the project, you will be taken to the Datasets view.
Click Data library and look for the Google Natural Questions - tutorial data dataset in the list.
Click on it to get more information.
The Google Natural Questions dataset
This dataset consists of over 300000 questions submitted by real people. We’ll learn how we can search it for questions that are similar to any new text input.
If you agree with the license, click Accept and import. This will import the dataset in your project, and you will be taken to the dataset’s details where you can edit features and subsets.
Save the dataset
The dataset is automatically set up, so you just have to click Save version.
Then click Use in new experiment to open up the Experiment wizard.
Build model with the Experiment wizard
Make sure the training and validation subsets are selected.
Inputs / target tab
Select question as Input and index as Target. (The target won’t be used to train a model but we need it to define a complete model graph.)
Select Problem type Text similarity.
Select the USE Embedding snippet.
Finally, click Create and you will find the model in the Modeling view.
The USE Embedding model is already pretrained for the purpose of text similarity. As a result, we don’t need to train it. We can directly run the model to create it and move to the next step.
Click Run and wait until the experiment has finished the default 1 epoch.
Navigate to the Deployment view (you can skip evaluation this time) and click New deployment.
Name the deployment and select Similarity search.
Make sure all settings are what you want them to be:
Experiment - Experiment 1, the experiment you just ran.
Checkpoint — Epoch 0 since we didn’t train the model.
Output feature — Sentence embedding.
The platform will begin to index all the questions in the Google Natural Questions dataset.
Click Enable once indexing is finished.
Your experiment is now ready to be called via the deployment API or try it with our API tester.
How does text similarity search work?
With text similarity, you want to compare a new text with all the texts you have in your dataset to find the most similar ones.
First, all the text questions from the dataset are processed by the model during indexing.
The resulting embedding vectors are saved in an index.
To find similar questions, you will send a new text to your model which will also be processed by the model.
The embedding vector for the new question will be quickly compared to the index, and the most similar questions from the dataset will be returned by the platform.
We’ve made it super easy for you to test the deployment. Click on Test deployment, and you will be directed to our Text similarity - API tester.
All info from your deployment is prepopulated (Token, URL, Text parameter).
You just need to type in some text and then click the Play button to try it.
You’ve added a dataset to your project and built a model based on the pretrained snippet Universal Sentence Encoder (USE Embedding).
You’ve indexed of all question in the Natural Questions dataset with the model.
You’ve deployed your model and tested it to find similar questions that real people have asked.
You can get a quick understanding of text similarity search in the blog post Search text by semantic similarity.