The curse of AI demos is that at first glance it's often hard to recognize that they are actually solving an extremely complex problem. After all, many demos seem to do things that are just "obvious".
With that in mind, let's dissect this demo a little to understand the hidden complexities that the Multilingual BERT model, which is driving it, is tackling.
- It can classify books / text it has never seen before - This might be obvious, but in case it wasn't, the model was never told beforehand how to classify our pre-prepared book excerpts and it still does it correctly. Find a book excerpt in the supported genres and try it yourself!
- It can classify books / text in languages it has never seen before - The model feeding this demo was trained on book excerpts in English, Finnish, French, German, Italian, and Swedish. However, it works equally well on the examples in Spanish, Russian, and Mandarin. In fact, the same model will work, without any additional tweaking or configuration, in any of over 100 languages. Try it!
- It can classify books / text into different categories by only looking at a few sentences - The model is able to understand the text in context and likely also able to pick up subtle differences in semantics and syntax between each genre and use this information to distinguish between them. All from only 1-3 (random) sentences of a book.
This is mind-boggling! Picture being asked to write a set of rules (or in other words, code) that would tell a computer how to perform this classification task, for any possible (random) text, in any of the 100 supported languages. It would be a huge undertaking in the best of cases, and more likely an unfeasible task.
This is the complexity that the Multilingual BERT model and this demo is tackling, and it does so in a way that is accessible to all of us. Try it yourself! Build your own model in just a couple of clicks by following our tutorial at the end of this article.