BERT (Bidirectional Encoder Representations from Transformers) pushed the state of the art in NLP by combining two powerful technologies:
- Transformer encoder network, a type of network that can process long texts efficiently by using self-attention.
- Bidirectional, meaning that it uses the whole text passage to understand the meaning of each word.