Find similar Google questions

Use text similarity to find out what similar questions others have asked.

  • Autocomplete search queries

  • Suggest answered similar questions

  • Identify most common topics

Person - Target audience: Beginners
Clock - Estimated time: 15 minutes

You will learn to
Peltarion logo - Build and deploy a model for text similarity
Peltarion logo - Use the output from a specific block in the model

Why text similarity search?
Text similarity is a way to quantify the similarity between two pieces of text, for instance, two questions written in natural language.
Similarity search is faster than direct text string comparison, and allows some flexibility in the results.

Why use text similarity

Create a project

First, click New project to create a project and name it, so you know what kind of project it is. Naming is important.

New project button

Add the Google Natural Questions dataset to the platform

After creating the project, you will be taken to the Datasets view.

Click the Import free datasets button.

Import free datasets button

Look for the Google Natural Questions - tutorial data dataset in the list.
Click on it to get more information.

The Google Natural Questions dataset

This dataset consists of over 300000 questions submitted by real people. We’ll learn how we can search it for questions that are similar to any new text input.

If you agree with the license, click Accept and import. This will import the dataset in your project, and you will be taken to the dataset’s details where you can edit features and subsets.

Save the dataset

The dataset is automatically set up, so you just have to click Save version.

Then click Use in new experiment to open up the Experiment wizard.

Use in new experiment button

Build model with the Experiment wizard

  • Dataset tab
    Make sure the training and validation subsets are selected.

  • Inputs / target tab
    Select question as Input and index as Target. (The target won’t be used to train a model but we need it to define a complete model graph.)

  • Snippet tab
    Select Problem type Text similarity.
    Select the USE Embedding snippet.

  • Finally, click Create and you will find the model in the Modeling view.

The USE Embedding model is already pretrained for the purpose of text similarity. As a result, we don’t need to train it. We can directly run the model to create it and move to the next step.

Click Run and wait until the experiment has finished the default 1 epoch.

Run button

Deploy model

Navigate to the Deployment view (you can skip evaluation this time) and click New deployment.

  1. Name the deployment and select Similarity search.
    Similiarity search deploy switch

  2. Make sure all settings are what you want them to be:

    • Experiment - Experiment 1, the experiment you just ran.

    • CheckpointEpoch 0 since we didn’t train the model.

    • Output features: question and answer. You will get these features back when you search with a new question.

    • Output blockSentence embedding.

  3. Click Create.
    The platform will begin to index all the questions in the Google Natural Questions dataset. This might take a while.

  4. Click Enable once indexing is finished.

Enable button

Your experiment is now ready to be called via the deployment API.


How does text similarity search work?

With text similarity, you want to compare a new text with all the texts you have in your dataset to find the most similar ones. Text similarity is a way to quantify how similar 2 pieces of text are.

  1. First, all the text questions from the dataset are processed by the model during indexing.
    The resulting embedding vectors are saved in an index.

  2. To find similar questions, you will send a new text to your model which will also be processed by the model.
    The embedding vector for the new question will be quickly compared to the index, and the most similar questions from the dataset will be returned by the platform. The lower the distance, the better, so close to 0 is good.

Text similarity search

Text similarity search is fast since the text is reduced to a small amount of numerical values.


Test deployment

We’ve made it super easy for you to test the deployment. Click on Test deployment, and you will be directed to our Text similarity - API tester.

All info from your deployment is prepopulated (Token, URL, Text parameter).

You just need to type in some text and then click the Play button to try it.

Result

The most similar question with the lowest distance will be at the top. And if there is an answer to that question, it will be displayed as well.

Text similarity test app

Tutorial recap

  • You’ve added a dataset to your project and built a model based on the pretrained snippet Universal Sentence Encoder (USE Embedding).

  • You’ve indexed all questions in the Natural Questions dataset with the model.

  • You’ve deployed your model and tested it to find similar questions that real people have asked.


Next step

Classify text in any language

In the Classify text in any langueage tutorial we will show you how you can use the Peltarion Platform and its Multilingual BERT snippet to create a model that is able to work with multiple languages simultaneously!
You will learn how to automatically classify text extracts depending on their topic. Mix the available languages for training the model, and test it in any language.

Multilingual BERT tutorial
Figure 1. Let the computer tirelessly sort out piles of text, no matter the language!

Further reading

You can get a quick understanding of text similarity search in the blog post Search text by semantic similarity.

Was this page helpful?
YesNo