XLM-R Embedding

The XLM-R network (published in Unsupervised Cross-lingual Representation Learning at Scale) has excellent performance for Natural Language Processing (NLP), in particular for similarity tasks.
The XLM-R Embedding snippet allows you to quickly get started with your language-based model.

Why use a multilingual model?

A multilingual model allows you to deploy a single model able to work with any of the 100 languages.

Multilingual training dataset and predictions
Figure 1. Example of sentiment classification. The training data combines examples from English and French which are easily available. The model predicts the sentiment of a sentence in any language.

More than a simple convenience, multilingual models often perform better than monolingual models.
One reason is that the training data available is generally more limited in any single language. In addition, many languages share common patterns that the model can pick up more easily when it is trained with a variety of languages.

The XLM-R Embedding snippet

The XLM-R Embedding snippet includes:

  • An Input block.
    Select an input Feature that has the Text encoding in the Datasets view.

  • An XLM-R Tokenizer block.

  • An XLM-R Encoder block with pre-trained weights.

  • An Output block.
    Allows you to extract the encoded text feature as a sentence embedding when the model is deployed.

  • A Dense block.
    Adjust the number of Nodes of this block to match the feature of the Target block when training.

  • A Target block, which may be linked to any categorical or numeric feature when training.

How to train the XLM-R Embedding snippet

Please note that datasets, machine-learning models, weights, topologies, research papers and other content, including open source software, (collectively referred to as “Content”) provided and/or suggested by Peltarion for use in the Platform and otherwise, may be subject to separate third party terms of use or license terms. You are solely responsible for complying with the applicable terms. Peltarion makes no representations or warranties about Content. You expressly relieve us from any and all liability, loss or risk arising (directly or indirectly) from Your use of any third party content.

The weights provided were pre-trained for 100 languages, and make the model particularly well suited for similarity tasks without further training.

If you want to fine-tune the model for your own data, you can follow this procedure for fine-tuning pre-trained snippets.

Fine-tuning an XLM-R model

XLM-R is also a powerful model, which can learn most fine-tuning datasets very easily. This means that it is prone to catastrophic forgetting and overfitting of the new dataset when trained with inappropriate settings.

To avoid these issues, train your model with a very low Learning rate, of the order of 10-5 to 10-6.
In addition, only train for a few Epochs, between 1 and 3.

Available weights

The XLM-R Encoder block uses the xlm-roberta-base model with weights, pre-trained by Hugging face on 100 languages from CommonCrawl.


When using pretrained snippets, additional terms apply: XLM-R with weights licence.


Alexis Conneau, Kartikay Khandelwal, et al.: Unsupervised Cross-lingual Representation Learning at Scale, 2020.

Guillaume Wenzek, Marie-Anne Lachaux, et al.: CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data, 2019.

Was this page helpful?