Target audience: Data scientists and Developers
Before following this tutorial, it is strongly recommended that you complete the tutorial Deploy an operational AI model if you have not done that already.
Word embeddings is an important concept in NLP. For an introduction and overview of different types of word embeddings, check out the links below:
Text classification aims to assign text, e.g., tweets, messages, or reviews, to one or multiple categories. Such categories can be linked to user sentiment. In this tutorial, the expressed review is analyzed as either positive or negative.
To build your classifier, you'll need labeled data, which consists of text and their corresponding labels.
CNNs are generally used in computer vision, but are also suitable for specific NLP tasks such as sentiment analysis. Unlike neural networks that are designed to process image data, the CNN that you will build is based on 1D layers and depends on text embeddings, which provide a vector representation of the input words and their use.
This dataset contains movie reviews along with their associated binary sentiment polarity labels (positive or negative). It is intended to serve as a benchmark for sentiment classification.
The core dataset contains 50,000 reviews split evenly into a training and test subset.
The overall distribution of labels is balanced, i.e., there are 25,000 positive and 25,000 negative reviews.
The raw dataset also includes 50,000 unlabeled reviews for unsupervised learning, these will not be used in this tutorial.
In the entire collection, no more than 30 reviews are allowed for any given movie because reviews for the same movie tend to have correlated ratings.
In the labeled train/test sets, a negative review has a score that is less or equal to 4 out of 10 and a positive review has a score that is higher than 7. Reviews with more neutral ratings are not included in the dataset.
Each review is stored in a separate text file, located in a folder named either “positive” or “negative”.
The preprocessed dataset
The dataset that you will upload has been preprocessed so that all the reviews and their respective sentiments are stored in a single CSV file with two fields, “review” and “sentiment”.
The review text may include commas, which will be interpreted as a field delimiter on the platform. To escape these commas, the text is surrounded by double quotes.
The processed dataset only includes the training data.
If you are familiar with Python and want to learn how the raw data was processed, you may want to try to generate it yourself using this Jupyter notebook.
Among other things, it will give you some insight into why certain values are used later on in the tutorial.
First, create a project and name it so you know what kind of project it is.
Click the review column and set Encoding to Text (Beta). Two new parameter settings will appear.
Sequence length corresponds to the expected number of words in each review sample. If the actual number of words in a sample are fewer than indicated by this parameter, the text will be padded. If it is longer, the text will be truncated, i.e., cut from the bottom-up to fit in the sequence.
If you run the preprocessing notebook or use other means to evaluate the text, you will notice that the longest text is approximately 2,500 words, but that is an outlier. A sequence length of approx 600-650 makes it possible to process 95% of the data without any truncation. This is important because you will want to set this value as low as possible to reduce training time and minimize memory use. Based on the information that we have about this dataset, set the Sequence length to 650.
The Language model should match the language of the input data, in this case English. Read here for more information about this parameter.
In the top right corner, you’ll see the subsets. All samples in the dataset are by default split into 20% validation and 80% training subsets. Keep these default values in this project.
You’ve now created a dataset ready to be used in the platform. Click Save version and navigate to the Modeling view.
The model data you will create is based on a text classification example provided in the Keras documentation.
The example suggests that you use a Text embedding block followed by a Dropout block and then a 1D Convolution block. The output from these blocks is then bridged to a fully connected layer, a Dense block, via a 1D Global max pooling block. Max pooling is required here since the input to the Dense block must have only 1 dimension.
You can see a representation of this model below, but don't start building it just yet.
Before you go ahead and create a new experiment, you may want to consider some improvements that will give you a better performing model.
In this contribution to a competition on Kaggle, a similar classification model is used, but there are four parallel 1D Convolution blocks instead of one. Each 1D Convolution block has different filter sizes and is followed by a 1D Global max pooling block. The output from each of the four “arms” are then concatenated. This multi-channel CNN” allows the document to be processed at different resolutions and will give you somewhat better results.
This approach was first described by Yoon Kim in his 2014 paper titled Convolutional Neural Networks for Sentence Classification.
This is the model you are going to build:
You can use either the default ReLU activation function or TanH in the 1D Convolution blocks. The TanH activation function will produce slightly better results.
Click the Settings tab and change Batch Size to 1024 and Epochs to 5. Keep the default settings for all other parameters.
Navigate to the Evaluation view and watch the model train. It will only take a few minutes as a result of the combination of data size, model complexity and number epochs.
The training loss will continue to decrease for each epoch, but the evaluation loss will start to increase again after the first two epochs. This means that model is starting to overfit to the training data.
You can read more about the loss metrics here.
To evaluate the performance of the model, you can look at overall accuracy, which is displayed in the Experiment info section to the right. It should be approximately 85-90%. For comparison, a “baseline classifier”, which simply predicts that all samples belong to the most common class would have an accuracy of approximately 50%. This because there is an even distribution between positive and negative labels in the validation subset.
Since the model solves a classification problem, a confusion matrix is displayed. The top-left to bottom-right diagonal shows correct predictions. Everything outside this diagonal are errors.
The recall per class corresponds to the percentage values in the confusion matrix diagonal. You can display the same metric by hovering over the horizontal bars to the right of the confusion matrix. You can also view the precision per class by hovering over the vertical bars on top of the confusion matrix.
High recall and low precision mean that the model "catches" most of the examples in a class, but many other predictions will be false positives.
High precision and low recall is just the opposite. The model underpredicts the class but is accurate in the positive predictions that it makes. That is to say that it is too picky.
The key metric here depends on how you will apply the results in your deep learning application.
The total number of correct vs. incorrect predictions are displayed in the error graph below the confusion matrix.
Now that we know the accuracy of the model based on validation data. Let’s deploy the model and try it with some new data.
Enter this address to our web app Text classifier API tester into your preferred browser: https://bit.ly/2LL73fm
Copy the URL in the Deployment view and paste it into the URL field of the web app.
Copy the Token in the Deployment view. The API is called by sending an HTTP POST to the endpoint shown in the interface. The token is required to authenticate the calls. Paste the copied token into the Token field in the web app.
Type review (case sensitive) in the Input name field. This value must match the name of the input feature in the Deployment view.
Copy a recent review from e.g., IMDB , paste it into the text field, or write your own, then click the Play button to get a result.
I don’t want to complain about the movie, it was really just ok. I would consider it an epilogue to Endgame as opposed to a middle film in what I’m assuming is a trilogy (to match the other MCU character films). Anyhow, I was just meh about this one. I will say that mid-credit scene was one of the best across the MCU movies.
To see what an actual request from the application and the response from the model may look like, you can run the example CURL command that is provided in the Code examples section of the Deployment view. Replace the VALUE parameter with review text and run the command in a terminal.
curl -X POST \
-F "review=A fun brain candy movie...good action...fun dialog. A genuinely good day" \
-u "<Token>" \
The output will look something like this:
The predicted result is positive since it gets the highest value, 0.82801074.
In this tutorial, you’ve created a text classification model that you first evaluated and then deployed.
Continue to experiment with different hyperparameters and tweaking your experiments to see if you can improve the accuracy of the model further.
You may also want to experiment with datasets from different sources but note that the model in this tutorial works best with short text samples (approximately 1,000 words or less).
The web app you’ve used now can be used for testing other single-label text classification models as well.
Stay in the know by signing up for occasional emails with tips, tricks, deep learning insights, product updates, event news and webinar invitations.
We promise not to spam you or share your email with any third party. You can change your preferences at any time. See our privacy policies.
Please check your email inbox account to confirm, set, or update your communication preferences.