Find similar Google questions

Use text similarity to find out what similar questions others have asked.

Text similarity is a way to quantify the similarity between two pieces of text, for instance, two questions written in natural language.

Person- Target audience: Beginners
Spaceship- Tutorial type: Get started tutorial
Clock- Estimated time: 15 minutes
Bell- Problem type: Text similarity

You will learn to
Peltarion logo- Build and deploy a model for text similarity.


Create a project

First, click New project to create a project and name it, so you know what kind of project it is. Naming is important.

New project button

Add dataset to the platform

After creating the project, you will be taken to the Datasets view.

Click the Import free datasets button.

Import free datasets button

Look for the Google Natural Questions - tutorial data dataset in the list.
Click on it to get more information.

The Google Natural Questions dataset

This dataset consists of over 300000 questions submitted by real people. We’ll learn how we can search it for questions that are similar to any new text input.

If you agree with the license, click Accept and import. This will import the dataset in your project, and you will be taken to the dataset’s details where you can edit features and subsets.


Build model with the Experiment wizard

The dataset is automatically set up, so you just have to click Use in new experiment to open up the Experiment wizard.

Use in new experiment button
  • Dataset tab
    Make sure that the Google Natural Questions dataset is selected.

  • Inputs / target tab
    Select question as Input and index as Target. (The target won’t be used to train a model but we need it to define a complete model graph.)

  • Problem type tab
    Select Problem type Text similarity.

  • Finally, click Create and you will find the model in the Modeling view.

Modeling canvas

The Universal Sentence Encoder block is already pretrained for the purpose of text similarity. As a result, we don’t need to train it. We can directly run the model to create it and move to the next step.

Click Run and wait until the experiment has finished the default 1 epoch.

Run button

Deploy model

In the Evaluation view, click Create deployment (you can skip evaluation this time).

Create deployment button
  1. Name the deployment and select Similarity search.
    Similiarity search deploy switch

  2. Make sure all settings are what you want them to be:

    • Experiment - The experiment you just ran.

    • CheckpointEpoch: 1 since we didn’t train the model.

    • Embedding blockText embedding.

    • Output features. You will get these features back when you search with a new question. Select:

      • question

      • answer

  3. Click Create.
    The platform will begin to index all the questions in the Google Natural Questions dataset. This might take a while.

  4. Click Enable once indexing is finished.

Enable button

Your experiment is now ready to be called via the deployment API or by deploying your own deployment web app straight from the deployment page.


Test with deployment web app

We’ve made it super easy for you to test the deployment. Click on Open web app, and you will be directed to our Deployment web app.

Open web app button

You just need to type in some text and then click the Get your result button to try it.

Result

The most similar question with the lowest distance will be at the top. And if there is an answer to that question, it will be displayed as well.

Text similarity test app

Tutorial recap

  • You’ve added a dataset to your project and built a model based on the pretrained Universal Sentence Encoder.

  • You’ve indexed all questions in the Natural Questions dataset with the model.

  • You’ve deployed your model and tested it to find similar questions that real people have asked.


Next step

Get started with text classification

In the Classify text in any language tutorial, we will show you how you can use the Peltarion Platform and its Multilingual BERT to create a model that is able to work with multiple languages simultaneously!
You will learn how to automatically classify text extracts depending on their topic. Mix the available languages for training the model, and test it in any language.


Further reading

You can get a quick understanding of text similarity search in the blog post Search text by semantic similarity.

Was this page helpful?
Yes No