Create new dataset / Example workflow

Image recognition is a common use case for Deep Learning. You can train models to recognize numbers (this tutorial) or a car damage (this tutorial).
This example workflow will show you how to collect images for an image recognition use case and perform preprocessing from scratch.

Say you want to build a deep learning model that predicts if an image contains a cigarette-end or not. The first thing you need to do is to take a lot of photos with and without cigarette-end. You need a lot of images (ideally, same amount of images from each category), because it takes a lot of data to train a neural network. Then you need to preprocess the images and bundle them in a zip file for easy upload to the Platform.

Step 1: Data acquisition

  • Take pictures of the ground that has cigarette-end. Try to position the cigarette in the middle of the image.
    When we did this we took 197 images.

  • Take pictures of the ground that has no cigarette-end.
    We took 64 images.

  • Import all images to your computer in a folder named CigaImages.

Samples of the images

samples

Step 2: Data preprocessing

Backup your original images before preprocessing. This is a good practice in case something goes wrong.

Resize images to 224x224 pixels

  • Create a bash file named resize.sh with the following script:

#!/bin/bash

for i in *.jpg

do

	convert "$i" -gravity Center -crop 224x224+0+0 "$i"

	echo "$i"

done
  • Navigate to the CigaImages folder via the Terminal, and run the resize.sh command to resize all images:

$./resize.sh

NOTE: If you get ‘bash: ./program Permission denied’ issue, run the following command:

$chmod u+x resize.sh
  • Delete the file resize.sh.

Label all images

Create two folders: 1 and 0 in the CigaImages folder.
Manually separate the images. Move the images that have cigarette-end to folder 1 and move the images that have no-cigarette-end to folder 0.

Verify that all resized cigarette-end images do contain the cigarette-end. Always good to be sure that the ground truth is correct :).

Rename the images in folder 0

  • Create a bash file named rename0.sh with the following script:

COUNTER=1;

for file in *.jpg

do

	mv "$file" "0_$COUNTER.jpg"

	COUNTER=$[$COUNTER +1]

done
  • Navigate to the CigaImages folder via the Terminal, and run the rename0.sh command to rename all images in the 0 folder:

$./rename0.sh
  • Delete the file rename0.sh.

Rename the images in folder 1

  • Create a bash file named rename1.sh with the following script:

COUNTER=1

for file in *.jpg

do

	mv "$file" "1_$COUNTER.jpg"

	COUNTER=$[$COUNTER +1]

done
  • Navigate to the CigaImages folder via the Terminal, and run the rename1.sh command to rename all images in the 1 folder:

$./rename1.sh
  • Delete the file rename1.sh.

Result

ciga 0 1 folder

Step 3: Create csv file

Create a text file with all image names

  • In the Terminal, navigate to the CigaImages folder.

  • Run the following command to read all files in a folder and save only the .jpg file names to a text file named files-all-jpg-001.txt.

$find . -type f -name "*.jpg" > files-all-jpg.txt

The output looks like this:

ciga files all jpg
  • Run the following command to remove ‘./’ from files-all-jpg.txt and get a new file named files-all-jpg-001.txt.

$ sed 's~\./~~g' files-all-jpg.txt > files-all-jpg-001.txt

Result

ciga files all jpg 001

Create a file with all labels

  • Run the following command to store all labels in a text file named label-001.txt.
    The order of the labels should be consistent with the order of the names of the images in files-all-jpg-001.txt.

$ cut -d'/' -f1 files-all-jpg-001.txt> label-001.txt

Result

ciga label001

Combine image names and labels in one file - index.csv

Combine the file files-all-jpg-001.txt and label-001.txt, so that you have both the image path and image label in one file named index.csv.

  • Combine the file files-all-jpg-001.txt and label-001.txt into index.csv.

$ paste -d',' files-all-jpg-001.txt label-001.txt > index.csv
  • Shuffle the rows in the index.csv file.

$ shuf -o  index.csv  <  index.csv
  • Add header to the index.csv file.

$ echo 'image,label' | cat - index.csv > temp && mv temp index.csv

Result

ciga index

Step 4: Create the zip-file

Zip the index.csv together with the two image folders (0 and 1).

  • Navigate to the folder CigaImages, create a new folder named ciga_0_1.

$mkdir ciga_0_1
  • Move the index.csv and the two image folders (0 and 1) to ciga_0_1.

$mv 0 1 index.csv ciga_0_1/
  • Zip the folder ciga_0_1 and save it as ciga_0_1.zip.

$zip -r ciga_0_1.zip ciga_0_1 -x "*.DS_Store"

Done!! The zip file is now ready to be imported into the Platform.

What’s next

Upload the ciga_0_1.zip (Zipped ciga_0_1 bundle) to Peltarion Platform and carry on the experiments.