Making soundtracks easy to tag & find, through AI


Epidemic Sound
Founded: 2009, Stockholm, Sweden

AI method

Multi-label classification



Four years ago, digital marketing expert Gary Vaynerchuk boldly stated[1] that “The single most important strategy in content marketing today is video.” Since then, video has only become a bigger and bigger force in creative productions, from small content producers to giant productions. This trend, which seems unstoppable, has profoundly impacted the music industry, because overlaid on nearly all these videos are songs. And choosing the right soundtrack for a video can be an enormous undertaking for creators.

To address this need, Epidemic Sound has built a platform where people can search for music and license it for videos. Key to that mission is creating a match between what people are looking for and what they find as consistent and accurate as possible.

02/ Summary

To help Epidemic Sound in this quest, we put our heads together to find a way of classifying music. We were hoping to streamline the process, make the quality better, and get away from personal opinions. We found that we could transform sound waves from songs to spectrograms, and treat them as images. This way, we could easily create models to classify different moods; Smooth, Hopeful, Romantic, Eccentric…. The model we created performed an 91.8% accuracy and thanks to skills from the music industry together with experience from our AI sphere, we could save an immense amount of time, and get to more consistent results.

Want to try this for yourself? Build a music tagger model on the Peltarion Platform with this tutorial. Or, watch this episode where our Andreas guests Seth Juarez in the Microsoft’s AI Show and talks us through this case.

03/ Which problem could be solved with AI?

Understanding the mood of a song is central to capturing the essence of a melody. When you hear a song on the radio, you probably inherently understand whether it’s happy, sad, melancholy, or energizing. But in scientific terms, what makes it take on that emotional tenor?

Remember the days when we placed the vinyls in different trays?

In the old record shops, classifying music was a manual task. Record-store workers assigned vinyl to different trays depending on a subjective classification. Much later, this technique evolved when companies like Spotify and Pandora started classifying digital versions of songs — still completely manually. Manual classification is not just a ridiculously time-consuming task. It’s also one filled with personal bias, with the consequence of inconsistency.

To help improve the quality in classification of their music library, and to decrease the time it took to classify the entire library, Epidemic Sound worked with data scientists at Peltarion and created deep learning models on the Peltarion Platform.

04/ Sound transformed to an image

The first step was to prepare the data — in this case, songs. To make the data input useful and easy to work with, the songs were captured through their sound waves as spectrograms. A spectogram is a picture showing sound waves in three dimensions: frequency, intensity, and time. The songs were split into parts representing different moods like:

  • Epic
  • Laidback
  • Mysterious
  • Sexy
  • Casino
  • Quirky
  • ...

These classifications were initially made by a human in order to provide training data. At this stage, it’s important to choose the input data as accurately as possible, because it will influence the end result closely. Any potential problem or mistake will only escalate along the way.

Songs transformed to spectograms as input data

05/ Creating models & tweaking to improve

Spectrograms, fortunately, can be used as normal images, which makes them an accessible data type for deep learning. As a base for building models on the platform, the FCN-6 model[1], which consists of a number of convolutional layers followed by a final set of densely connected layers, was chosen. The model was created with the help of Convolutional Neural Networks (CNN) and then evaluated and tweaked in several steps through different versions to find the best result. All these iterations were done with training data, which by default in the Peltarion Platform is 80% of the total available data, leaving the final 20% for validation of the model.

06/ What result did Epidemic Sound & Peltarion achieve?

By calling the model with your song, we could figure out which mood the song could be tagged as. So in a second, we could create a tag that would otherwise take time, and also be biased depending on who was doing the work.

Which of the 46 moods you think applies?

The biggest initial concern for Epidemic was to find a higher level of accuracy when searching for songs. The better the accuracy, the easier it would be to find the right song to fulfill a particular need, and the bigger the chance for more licenses and better business. The models created in the Peltarion Platform achieved an accuracy rate of 91.8%. But well, this number doesn’t mean much if you don’t know how accurately a human might be able to calculate song categories. While we don’t have data on that, we do know that in the medical field, even highly skilled radiologist error rates can be as high as 30%. 

Besides the accuracy and consistency made possible for searching for songs that match moods, the process for using deep learning to categorize music genres will be a game-changer for Epidemic Sound. Höglund’s rough estimate is that it will be ten times faster for the company to conduct the same task from now on, giving them a chance to sell many more licenses.

07/ Advice on your AI journey

Top tips from Oscar Höglund, CEO Epidemic Sound:

  1. Be brave and make AI a part of your core business
  2. Make sure you have your data in order
  3. Be certain you define a clear problem to solve

You must be brave. It will make your business 
10x better. Do not dismiss [
AI] as a niche business
 area or as R&D

Oscar Höglund, CEO / Epidemic Sound

08/ Try for yourself