Text encoding allows to use text written in natural language, i.e., one or more sentences that would normally be spoken or written.
How long should text features be?
If a feature contains textual keywords or tags, e.g.,
mountain, then the Categorical encoding is likely a better encoding choice.
Text encoding is most powerful when processing complete sentences, including grammatical constructions.
Different language processing models have different upper limits on the text length that they can process:
Models using one of the BERT blocks support up to 512 tokens, roughly 300 to 500 words.
Models using the Universal sentence encoder don’t have a limit on text feature length.
You can set the exact Sequence length you want to use in the block parameters when you design your model.
What languages are supported?
Working with many languages
This means that you can:
Mix examples from different languages in the same training dataset.
Fine-tune your models with data in languages that are easily available, but use it for predictions from any other language.