Data science /

Building a whiskey classifier with Bubble and Peltarion

March 13/4 min read
  • Vanessa Tang & Ziyou ZhangWinners of the No-Code AI Challenge

It has been nearly 20 years since Whiskey Classified was first published, a book that revolutionized our appreciation of whisky. David Wishart guided us through the process of identifying the aroma and taste of whiskey using simple day-to-day adjectives, therefore, opening the door to understanding whiskey to a much bigger population. The global whiskey market size was valued at USD 57.96 billion in 2018 and is projected to register a CAGR of 6.4% over the forecast period.

02/ The use case

Nowadays, whiskey lovers form communities on different forums such as Reddit (e.g. there are 172k members on /r/whiskey, 140k members on /r/Scotch), reviewing and discussing whiskey. These become a valuable data source for whiskey sellers to understand the customers' views on their products.

We, Whiskey Lens, attempt to use Natural Language Processing to understand the sentiment of these reviews, and scores each product by each taste category such as Sweetness, Spiciness, Fruitiness, Smokiness, as well as an overall score. Providing these data to whiskey sellers would allow them to understand if they are making the whiskey they want to make.

Take Erik and his distillery as an example. He wanted to make a single malt whiskey that is sweet but not smokey. After launching the product, he went on Reddit to see if anyone is talking about his whisky. Reading a few threads, he saw a wide range of comments, describing his whisky with words such as "dark chocolate", "caramel", "dried fruit". Erik wishes to further understand what his customers think. Therefore, he decided to log into the Whiskey Lens dashboard. The dashboard showed him a score of 3/5 for sweetness and 2/5 for smokiness, which gave him better insight into either modifying the current product or when making future products.

This business will be B2B subscription-based business model. The most basic subscription tier gives access to whiskey details of a limited number of their own products. Moving up the tiers, the client will have access to details of a wider range of their own products or even products of their competitors.

AI used in the whiskey industry includes using AI to create a new recipe, using ML to analyze people's flavor preferences and matches them with their perfect whisky. Researchers at Virginia Polytechnic Institute have attempted to use NLP to standardize whiskey notes. There has been no previous attempt to use AI and NLP to quantify the reviews, i.e. scoring each taste category based on comments or articles.

03/ The dataset

The data was downloaded from Kaggle, with additional columns artificially generated to build a model for this competition. It included features such as bottle name, category, price, review text and review rating (out of 100). Generated columns include scores for sweet, spicy, smoky, medicinal, honey and fruity. In total, the dataset includes 2247 reviews.

04/ Building the model

The Peltarion Platform was used to build the model and we decided to use NLP, more specifically the model BERT Tokenizer, English BERT encoder, and two dense layers before the final output layer.

The model has limited accuracy at this stage due to lack of data, as well as using artificially generated data (which introduces human error)

How the workflow was set up

Model performance

05/ Deployment

The workflow looked like this:

  1. Whiskey Lens hosts a service to constantly scrape the review text from the web (outside of the scope of this project).
  2. The data is stored on Google Sheets.
  3. We generate the scores for each category using the deployed model on Peltarion, via the Google Sheets Peltarion add-on.
  4. The new row on Google Sheet will trigger the Zapier workflow, pushing the data to the Bubble.io database.
  5. The data gets calculated and displayed on bubble.io
  • Vanessa Tang & Ziyou Zhang

    Winners of the No-Code AI Challenge

    Vanessa and Ziyou were the winners in the "Best Data Science" and "Best Deployment" categories in our No-Code AI Challenge. The competition took place in February 2021.

02/ More on Data science