Audio analysis for industrial maintenance

Apply deep learning to real business problems

This tutorial will show you how to process audio data and use an AI model that can suggest maintenance needs.

Target audience: Intermediate Users

Estimated Time:
  • Setup - 15 minutes

  • Training - 30 minutes

The problem

factory machinery

A key part of smart manufacturing and a modern factory approach involves real-time monitoring of machinery operating conditions. This is a perfect use case for deep learning. The goal of this tutorial is to build a deep learning model that can identify whether a machine is operating under normal or abnormal conditions.

Once trained, a model like this could be used to detect failures and automatically create work schedules for maintenance workers.

You will learn to

  • Transform audio files to mel spectrograms.

  • Solve a binary image classification problem

  • Analyze the performance of your model

  • Test the deployment of your model

The data

The audio recordings are produced by Hitachi and the original dataset can be found here. The dataset contains sounds from four types of industrial machines: valves, pumps, fans, and slide rails.

To analyze audio files on the Peltarion Platform, the recordings are converted to mel spectrograms. A spectrogram displays the strength of a signal at different frequencies and how this varies over time.

Image classification models can then be used to gain insight into the complicated patterns in the audio data to predict if a machine is operating as it should.

If you are interested in seeing how this is done or do this yourself, you can have a look at our Machinery spectrogram Colab notebook. By doing the data processing yourself, you can also choose what type of machine you would like to build your model on.

Otherwise, you can import spectrograms that we made on the 6dB Valve data straight from our Data library.

Spectrogram examples

To gain a little bit of insight into what a spectrogram is and how it works, listen to the examples in the table below and compare them to the corresponding spectrograms.

The recordings are of solenoid valves opening and closing repeatedly. The exact nature of the abnormality is unknown. Low frequencies are shown at the top and the time progresses from left to right. The lighter section at the top represents the low background noise from the factory and the sharp vertical lines are the distinct clicks from the valves.

Normal Abnormal

normal spectrogram
abnormal spectrogram

Create a project

Let’s begin!

Navigate to the Peltarion Platform and click New project.

Add the spectrograms to your project

You will then be taken to the Dataset view.

  • If you are using our ready-made spectrograms, click the Data library option and look for the Industrial machinery operating conditions dataset.

  • Otherwise, click Choose files to upload file made using the Colab notebook.

Use the default data subsets

  • By default, your data is split into two subsets: Training (80%) and Validation (20%).

Set binary encoding

  • This problem is an example of binary image classification. This means that the model assigns each photo one of two possible labels, normal or abnormal.

  • Binary encoding with the positive class as true should be the default setting but you can check by clicking the spanner icon next to the Abnormal feature to change the settings.

You are now ready to begin building the model. Click on Save version, then Use in new experiment.

Build the model

You will be greeted by the Experiment Wizard, which will take you through the following steps:

  1. Define dataset

    • Double-check that you are using the latest dataset version and that you have the appropriate subsets selected as training and validation subsets.

  2. Choose snippet

    • Make sure that your input feature is the images and that your output feature is the label.

    • Select the snippet called EfficientNetB0.

  3. Initialize weights

    • The EfficientNet snippet can be initialized with pretrained weights, which can transfer knowledge learned from a different dataset. This helps to get better results with less training time and data.

Run the experiment

It is now time to run the model. The default settings work well for this experiment, but feel free to play around with the optimization settings as much as you would like to see how they impact performance.

You have now finished setting up the experiment! Once you click Run, the model will take a while to run so have a break and check back later to see how well it is doing.

Evaluating the experiment

In the Evaluation view, you will find several ways of looking at how your model has performed.

Binrary accuracy and recall

For this particular situation, we are interested in binary accuracy and recall:

  • Binary accuracy measures the proportion of correct predictions.

  • Recall measures the proportion of actual positives (abnormal operation) that were marked as anomalous.

The reason for looking at both of these metrics is that while it is important to have a high accuracy overall, in this case, false negatives are more harmful than false positives.

A false positive means undergoing maintenance on a machine that is actually not malfunctioning. A false negative, however, means incorrectly allowing a malfunctioning machine to continue operating which will likely lead to bad production or more severe failure of the machine down the line.

Confusion matrix and ROC curve

To inspect the predictions of your model, select the subset and checkpoint (epoch) you wish inspect and click Inspect.

The confusion matrix is a good way of understanding exactly how your model has performed. It shows you how many predictions fall into each possible outcome: (true positive, true negative, false positive, and false negative).

The threshold slider allows you to change the sensitivity for making a positive anomaly prediction.

Since we consider that false negatives are more harmful than false positives, decrease the Threshold until few false negative predictions remain.

The ROC curve shows at a glance the proportion of anomalous examples that positively identified, as a function of the false positive rate.

Deploy your model

The last step in this process is to deploy your model and see how it might perform if used in the real world.

  1. In the Deployment view, click New deployment. The Create Deployment popup will appear.

  2. Select your last Experiment and your best epoch as your Checkpoint.

  3. Click the Enable button to deploy the experiment.

Test how it works using our web app

To test your model, you can use either use the Test Deployment button on the right-hand side of the deployment view or use the web app we made specifically for this case below.

  1. If you accessed the data from the data library, download and unzip these demo images.
    If you used the Colab notebook to create the dataset, use the file created by the notebook.

    The demo images are the spectrograms of 5 normal and 5 anomalous audio recordings. These 10 examples are not in the dataset used for training, meaning that that they have never been seen by the model before and represent new, unknown examples that we can use to test the model.

  2. Copy the URL and Token from the right-hand side of the deployment page and paste them into the corresponding fields in the app.

  3. Click the image icon to upload one of the test images. Once you have tried it, you can upload another image by clicking the image again.


Congratulations, you have completed the Audio analysis for industrial maintenance tutorial! In this experiment, you have solved a binary image classification problem that can be implemented in important real-life applications.

A model like this can be used to improve how maintenance is done, reducing costs and machine downtime by creating tailor-made maintenance schedules.

To take this a step further, you would need to create models that link machinery condition data (like the model built in this tutorial) to other data including input parameters and other sensors such as temperatures, weather conditions, etc., depending on the specific application. This would in turn allow predictive models that can say if a machine will break ahead of time.