Applied AI & AI in business /

A business view on semantic similarity

March 22/3 min read
  • Liliana Lindberg
    Liliana LindbergSolutions Architect

Natural Language Processing (NLP) is a field within AI that aims to understand the way humans communicate with each other and how to build systems capable of replicating that behavior. The latest advances in NLP capture semantics in a language in ways that were not possible before, opening a wide range of opportunities for companies to implement AI. In this blog, I provide concrete examples of how text similarity -a task within NLP- can improve efficiency in your business. All you need is text data.

02/ What is semantic similarity?

Text similarity is a common NLP task that indicates how similar two pieces of text are in semantic meaning. For example, the sentences: “For young adults climate change tops the list of concerns” and “Global warming is an issue that matters for Millennials” are very similar despite not sharing any common wording. In contrast, the sentence pair: “The company will ship the right box to the river” and “The company ship will depart from the right side of the river” are not similar in meaning nor context despite sharing many of the same words.

Examples of sentence pairs classified by semantic similarity using the latest NLP models.

In the example above, that accurate similarity quantification (that goes beyond statistical approaches) was not available just a few years ago, and therefore, holding back a promising adoption into the corporate world. For instance, the second sentence would have been misclassified as similar as it shares many of the same words despite having a different context and meaning.

03/ Where can I apply text similarity?

Semantic similarity can be applied in multiple scenarios where topics need to be found in different documents or text is compared or highlighted for specific purposes. Here are some use cases:

  • In Human Resources or in a Job market system a description of a desired role can bring to the applicants several job positions that best fit their interests. The results are found in a semantic way that goes beyond keyword search opening opportunities for job seekers and human resources professionals to find the best candidate-job matches.  After all, many job titles may be called in different ways from organization to organization.
  • Plagiarism detection or copyright infringement at the paragraph level in industry and academia.
  • Paraphrase detection where the goal is to find duplicate or highly similar texts in digital document repositories or systems for tasks such as removal, deduplication or standardization of corporate language.

Compare paragraphs in legal documents with semantic text similarity

  • In Market research text similarity can be used in a variety of contexts, being one of them the harmonization and selection of the best wording for questions to be used in survey creation. Questions can be expressed in such a way that the information gathered remains valuable and relevant through time. This is what we did for our customer Ipsos, a question library implementation where given a specific question a semantic similarity algorithm returns the most similar questions from a curated library. Read the customer story here.

Same question where the latter will remain useful in a few years

  • Reduce resolution time in a Customer Service, Help Desk or Issue tracking scenario. As soon as a new inquiry is logged, semantic similarity can provide to the service agent the top three resolutions of the most similar tickets like the one that has been logged. Information can come from a knowledge database or from the previous ticket solution descriptions.

Empower user-facing staff by providing tools to help them retrieve accurate information much faster

  • Smart search implementation on user manuals, help documents or product catalogs where the topics are found independent of typos, the language that is used or the writing style of the person formulating the query sentence. 

Use one model with the data you have and then operate it in 100+ languages!

  • For all the previous scenarios, text data can be in any language and furthermore several languages can exist within the same data repository or dataset. Multilingual models (models trained on multiple languages simultaneously, instead of training a single model for every language) make it possible to use one model with the data you have and then operate it in 100+ languages!

04/ Why is NLP and text similarity suddenly so popular?

Traditional NLP models often treated words the same regardless of the context or word order making them linguistically more “naive”. The rise of transformers-based NLP models revolutionized the way systems can interpret and understand language, opening a wide range of opportunities for companies to implement AI.

Transfer learning also plays an important role as it enables the knowledge acquired by these models -as the underlying language structure- be used to solve other, related problems effortlessly and with a much smaller amount of data. 

In the Peltarion platform, we have NLP models like English BERT, Multilingual BERT, Universal Sentence Encoder (USE) and XLM-R (model: XLM-R ← SBERT-nli-stsb) the two latter optimized for semantic similarity tasks.

If you are interested in a more technical description of text similarity, here is a great blog by my colleague Romain Futrzynski Search text by Semantic similarity.

05/ Get started!

Finally, it is time to roll up your sleeves and building some powerful AI solutions with NLP and text similarity on the no-code Peltarion platform. Here is a tutorial where you can create a text similarity model that finds similar Google questions that others have asked. For some additional inspiration, check out our tutorial catalog.

We hosted an introductory webinar for text similarity on April 8th, 2021. Watch the recording here: A closer look at Text similarity

If you want to explore more in detail how text can be used for your next AI project or need some guidance, please get in touch with me at liliana@peltarion.com. We’d love to hear from you!

  • Liliana Lindberg

    Liliana Lindberg

    Solutions Architect

    Liliana works as a solutions architect at Peltarion guiding customers to solve business challenges by using AI. She is passionate about emerging technologies and before joining Peltarion she worked for a number of years at Google as a GCP customer engineer. Her academic background includes BSc. Systems and Computing Engineering; MSc. in Geographical Information Systems from the University of Calgary, Canada; and a Master’s level Business Leadership Specialization from Duke University.

More to read