Data science /

What is BERT?

October 22 2020/10 min read
  • Reynaldo Boulogne
    Reynaldo Boulogne

If you have been following us, or any AI news for that matter, you might have come across the term BERT, usually accompanied by a lot of excitement around what it’s capable of doing. But what is BERT? And why is the AI community excited about it? We will try to answer those questions in this article.

If this is the first time you hear about NLP, we highly recommend reading our non-technical introduction to NLP Making sense of NLP - Part I before continuing to get a more complete and nuanced picture of what AI models in this field can and can’t do.

BERT is an AI model within the field of Natural Language Processing or NLP. In short, it aims to make written human language understandable to a computer, so that it can carry out tasks that depend on the content of the text. 

BERT isn’t the first AI model that was created for this purpose, but the reason why the AI community is excited about it is because it brought about a significant improvement to how well an AI model is able to understand text in context.

Since this is an introductory article to BERT, we will skip the technical details, but suffice to say that unlike prior models, BERT is able to correctly understand sequences of words in different contexts.

This is very powerful! It allows BERT to “understand”, for example, that the word bank can be used in two completely different contexts:

BERT doesn’t really know what bank means, but it knows that the word bank is often associated with two distinct groups of words which rarely are used in the same sequence.

The fact that it can make this distinction means that, for example, we can confidently ask a model to go through a large number of text / documents and ask it to return only the instances where the documents are talking about banks in a financial context and not in the context of a river bank. 

An this is just one example of what BERT can do:

To see specific sample applications for these tasks, make sure to read our article Making sense of NLP - Part II.

Prior to BERT, each of these tasks would have required a separate (and often less well-performing) model.

It’s precisely the combination of a better understanding of text in context and the versatility of the kind of task that BERT can be used for which has made it so popular in the AI community.

What do I need to use BERT?

BERT was originally a model that came out of Google’s research efforts and which was made publicly available.

One look at that Github repository makes it obvious that, while freely available, there is actually a lot of machine learning / software engineering work that has to go into converting the research code, into a production-ready code that can be used in an application.

This is precisely the work that we have done to get BERT onto the Peltarion Platform. But we’ve gone beyond just the engineering and also made it a lot more accessible to use:

The model is ready to be trained in just a couple of clicks! No need to worry about dealing with software bugs or getting / managing the hardware to run and deploy the model on. Everything is already taken care of for you.

With the Peltarion Platform getting all the engineering aspects out of your way, that only leaves the question of the data you need in order to run the model. 

Let’s talk data

You might have read that BERT has been pre-trained on a large corpus of text already, and if so, you’re perhaps wondering why you need more data before you can use the model.

In short, training a BERT model has to phases:

  • pre-training: this is the process during which BERT gets to automatically build an internal representation of the language structure. This is very data and compute heavy, as it requires the model to analyze billions of text samples to automatically find the patterns that make up the language structure. Luckily, this is a one-off process that has already been done for you.
  • fine-tuning: this is the process during which BERT learns specifically what you want it to do with the text that you will feed to it. For example, if you want BERT to classify your support tickets into predefined categories, fine-tuning is the process where you show it examples of support tickets and the corresponding categories you want BERT to classify them as. 

Here are some diagrams to elaborate on this:

Where do I go from here?

If you’ve read up to this point, there has been for sure a lot of information to take in. 

We are strong believers that the best way of learning is by doing some hands-on tinkering yourself. So with that in mind, we would highly recommend trying the two tutorials below, to put the information of this article into context (no pun intended 🙂).

  • Reynaldo Boulogne

    Reynaldo Boulogne

    With over 15 years of experience, Reynaldo has worked within the intersection of business and technology across multiple sectors, most recently at Klarna and Spotify. He is passionate about innovation, leadership, and building things from scratch. Reynaldo is also a former Vice-chairman of the Stockholm based AI forum, Stockholm AI.

02/ More on Data science