Target audience: Data scientists and developers
Use this cheat sheet:
If you want to kick start your use of text classification.
Text classification aims to assign text, e.g., tweets, messages or reviews, to one or multiple categories. Such categories can be review scores or positive v.s. negative. In order to build such classifiers, we need labeled data, which consists of text and their corresponding labels.
Positive v.s. negative
On Quora, people can ask questions and connect with others who contribute unique insights and quality answers. A key challenge is to weed out insincere questions. This quora dataset consists of answers labeled as sincere or sincere.
HappyDB is a corpus of more than 100,000 happy moments crowd-sourced via Amazon’s Mechanical Turk.
Datasets with text for text classification need to include:
1 column with text to be used as input
1 column as label to be used as predicted category in the Target block.
In the Datasets view, set the encoding of the text feature to Text and the label feature to Categorical.
Select a Language model that matches your input data language.
Example: If you use the Quora questions dataset Quora train.csv, unzip it and upload it to the Platform. Make sure to set the encoding for question_text feature as Text. Set the target feature as Categorical.
Build this straight forward model to get started. It’s not cutting edge and it will not solve all your problems, but it will show you what basic parts need to go into a text classification model.
Select your text encoded feature as input feature.
Make sure that the Language model you select match the Language model selected when you set encoding. Select Prebuilt embedding and then fastText. (Alternatively, you can choose Randomly initialized, and see if that fits your model better).
Select Softmax as Activation and set number of Nodes to 2.
Select your Categorical encoded feature as target feature and set the Loss function to Categorical classification.
In the Evaluation view, you can see in real-time how the AI model is performing as it’s learning from the data. You want to have as low loss scores as possible, and you want the training error to be slightly lower than test error.
Read more on how to evaluate your model in Classification loss metrics.
To make it easier to test your model on new sentences quickly, we have provided a Text Classifier web app.
Open the Text classifier: bit.ly/Text_classifier, define your deployment URL, token and input parameter, enter a new text and press ▶️ to predict!
Testing over cURL works as for any other deployment. Read more about how to use cURL here.
Stay in the know by signing up for occasional emails with tips, tricks, deep learning insights, product updates, event news and webinar invitations.
We promise not to spam you or share your email with any third party. You can change your preferences at any time. See our privacy policies.
Please check your email inbox account to confirm, set, or update your communication preferences.