Applied AI & AI in business /

Create applications that can analyze text written in over 100 languages

November 10 2020/5 min read

We recently launched Multilingual BERT on the Peltarion Platform. But what does it do and why should you care? We will try to answer these questions with a simple demo and give you some ideas on how you can apply the model to your own domain.

Before we begin: Multilingual BERT is an AI model within the field of Natural Language Processing or NLP. In short, it aims to make written human language understandable to a computer, so that it can carry out tasks that depend on the content of the text. 

If you're new to the field and would like to know more, we highly recommend reading our non-technical introductory articles Making sense of NLP - Part I and What is BERT? articles.

02/ What does this demo do?

We trained a Multilingual BERT model to predict the genre of a book only by looking at 1-3 sentence long excerpts. The book genres supported in this demo are biography, children’s stories, drama, history, mystery, and poetry.

Try it! You can copy-paste your own book excerpt into the field below or try one of our pre-prepared examples by clicking on the red button below.

(Note: the first time you'll try to classify an example it might take up to 15 seconds before the demo returns an answer. After this initial "cold start" the demo will return answers right away).

03/ Ok... so what's so special about this?

The curse of AI demos is that at first glance it's often hard to recognize that they are actually solving an extremely complex problem. After all, many demos seem to do things that are just "obvious".

With that in mind, let's dissect this demo a little to understand the hidden complexities that the Multilingual BERT model, which is driving it, is tackling.

  • It can classify books / text it has never seen before - This might be obvious, but in case it wasn't, the model was never told beforehand how to classify our pre-prepared book excerpts and it still does it correctly. Find a book excerpt in the supported genres and try it yourself!
  • It can classify books / text in languages it has never seen before - The model feeding this demo was trained on book excerpts in English, Finnish, French, German, Italian, and Swedish. However, it works equally well on the examples in Spanish, Russian, and Mandarin. In fact, the same model will work, without any additional tweaking or configuration, in any of over 100 languages. Try it!
  • It can classify books / text into different categories by only looking at a few sentences - The model is able to understand the text in context and likely also able to pick up subtle differences in semantics and syntax between each genre and use this information to distinguish between them. All from only 1-3 (random) sentences of a book.

This is mind-boggling! Picture being asked to write a set of rules (or in other words, code) that would tell a computer how to perform this classification task, for any possible (random) text, in any of the 100 supported languages. It would be a huge undertaking in the best of cases, and more likely an unfeasible task.

This is the complexity that the Multilingual BERT model and this demo is tackling, and it does so in a way that is accessible to all of us. Try it yourself! Build your own model in just a couple of clicks by following our tutorial at the end of this article.

04/ This demo is about books. How is that relevant to me?

While the use case is indeed about books, what we're actually showcasing is Multilingual BERT's ability to perform the task of text classification in any language.

If you want to know more about text classification, we recommend you have a look at our article What is NLP and how can I make use of it in my business? Part 2.

What we will focus on in this section is highlighting some subtle insights that might not immediately be obvious.

  • Multilingual BERT works in Swedish (and in your language too) - Many AI possibilities were limited to text written in English, but not anymore! Multilingual BERT works on any of over 100 languages. This opens up tons of use cases for markets with small speaker bases, like us here in Sweden, which have been traditionally underserved.
  • The model was build in a couple of minutes - AI is no longer confined to research. If you have the data, you can build a model on the Peltarion platform with just a couple of clicks. No need to worry about dealing with AI theory, software bugs or getting / managing the hardware to run and deploy the model on. Everything is already taken care of for you.
  • Reuse the model for many use cases - Unlike traditional software, the underlying model driving the demo can be used for any possible text classification use case you can think of. It isn't a rigid, bespoke solution. Simply train the model on some examples of how you would like it to classify your particular texts and it will learn how to sort them into the categories you specify.

05/ Curious to know more?

Why not jump directly into getting some hands-on experience and find out for yourself how you can create your own powerful text classification model with a few simple clicks, in just a couple of minutes?

  • Reynaldo Boulogne

    Reynaldo Boulogne

    With over 15 years of experience, Reynaldo has worked within the intersection of business and technology across multiple sectors, most recently at Klarna and Spotify. He is passionate about innovation, leadership, and building things from scratch. Reynaldo is also a former Vice-chairman of the Stockholm based AI forum, Stockholm AI.

02/ More on Business & Applied AI