Making soundtracks easy to tag & find, through AI

  • Company

    Epidemic Sound
    Founded: 2009, Stockholm, Sweden

  • AI method

    Classification

  • Industry

    Retail

Four years ago, digital marketing expert Gary Vaynerchuk boldly stated[1] that “The single most important strategy in content marketing today is video.” Since then, video has only become a bigger and bigger force in creative productions, from small content producers to giant productions. This trend, which seems unstoppable, has profoundly impacted the music industry, because overlaid on nearly all these videos are songs. And choosing the right soundtrack for a video can be an enormous undertaking for creators.

To address this need, Epidemic Sound has built a platform where people can search for music and license it for videos. Key to that mission is creating a match between what people are looking for and what they find as consistent and accurate as possible.

02/ Summary

Through translating pictures of sound waves, Epidemic Sound found they could use existing sound data as input. By making classifications of different moods — such as Happy, Romantic and Sad — the data was categorized into different groups. The exploratory AI model, consisting of algorithms built on the Peltarion Platform, was a number of convolutional layers followed by a final set of densely connected layers. The model achieved 91.8% accuracy and proved to be ten times faster and much more consistent than if a human being had tackled the categorization, according to CEO Oscar Höglund .

03/ Which problem could be solved with AI?

Understanding the mood of a song is central to capturing the essence of a melody. When you hear a song on the radio, you probably inherently understand whether it’s happy, sad, melancholy, or energizing. But in scientific terms, what makes it take on that emotional tenor?

In the old record shops, classifying music was a manual task. Record-store workers assigned vinyl to different trays depending on a subjective classification. Much later, this technique evolved when companies like Spotify and Pandora started classifying digital versions of songs — still completely manually. Manual classification is not just a ridiculously time-consuming task. It’s also one filled with personal bias, with the consequence of inconsistency.

To help improve the quality in classification of their music library, and to decrease the time it took to classify the entire library, Epidemic Sound worked with data scientists at Peltarion and created deep learning models on the Peltarion Platform.

04/ Sound transformed to an image

The first step was to prepare the data — in this case, songs. To make the data input useful and easy to work with, the songs were captured as spectrograms: — pictures showing the sound waves in three dimensions: frequency, intensity and time. The songs were split into parts representing different moods like funny, romantic, hopeful, sexy, quirky, etc. These classifications were initially made by a human in order to provide training data. At this stage, it’s important to choose the input data as accurately as possible, because it will influence the end result closely. Any potential problem or mistake will only escalate along the way.

Spectrograms show sound waves in three dimensions: frequency, intensity and time

05/ Creating models and tweaking to improve

Spectrograms, fortunately, can be used as normal images, which makes them an accessible data type for deep learning. As a base for building models on the platform, the FCN-6 model[1], which consists of a number of convolutional layers followed by a final set of densely connected layers, was chosen. The model was created with the help of convolutional neural networks (CNN), and then evaluated and tweaked in several steps through different versions to find the best result. All these iterations were done with training data, which by default in the Peltarion Platform is 80% of the total available data, leaving the final 20% for the real deployment.

06/ What result did Epidemic Sound & Peltarion achieve?

The biggest initial concern for Epidemic was to find a higher level of accuracy when searching for songs. The better the accuracy, the easier it would be to find the right song to fulfill a particular need, and the bigger the chance for more licenses and better business. The models created in the Peltarion Platform achieved an accuracy rate of 91.8%.

Still, this number doesn’t mean much if you don’t know how accurately a human might be able to calculate song categories. While we don’t have data on that, we do know that in the medical field, even highly skilled radiologist error rates can be as high as 30%. 

Besides the accuracy and consistency made possible for searching for songs that match moods, the process for using deep learning to categorize music genres will be a game changer for Epidemic Sound. Höglund’s rough estimate is that it will be ten times faster for the company to conduct the same task from now on, giving them a chance to sell many more licenses.

07/ Advice for the AI journey

Top tips from Oscar Höglund, CEO Epidemic Sound:

  1. Be brave and make AI a part of your core business
  2. Make sure you have your data in order
  3. Be certain you define a clear problem to solve

08/ Try for yourself

You must be brave. It will make your business 
10x better. Do not dismiss [
AI] as a niche business
 area or as R&D

Oscar Höglund, CEO / Epidemic Sound