# Create new dataset / Example workflow

Image recognition is a common use case for Deep Learning. You can train models to recognize numbers (this tutorial) or a car damage (this tutorial).
This example workflow will show you how to collect images for an image recognition use case and perform preprocessing from scratch.

Say you want to build a deep learning model that predicts if an image contains a cigarette-end or not. The first thing you need to do is to take a lot of photos with and without cigarette-end. You need a lot of images (ideally, same amount of images from each category), because it takes a lot of data to train a neural network. Then you need to preprocess the images and bundle them in a zip file for easy upload to the Platform.

## Step 1: Data acquisition

• Take pictures of the ground that has cigarette-end. Try to position the cigarette in the middle of the image.
When we did this we took 197 images.

• Take pictures of the ground that has no cigarette-end.
We took 64 images.

• Import all images to your computer in a folder named CigaImages.

## Step 2: Data preprocessing

Backup your original images before preprocessing. This is a good practice in case something goes wrong.

### Resize images to 224x224 pixels

• Create a bash file named resize.sh with the following script:

#!/bin/bash

for i in *.jpg

do

convert "$i" -gravity Center -crop 224x224+0+0 "$i"

echo "$i" done • Navigate to the CigaImages folder via the Terminal, and run the resize.sh command to resize all images: $./resize.sh

NOTE: If you get ‘bash: ./program Permission denied’ issue, run the following command:

$chmod u+x resize.sh • Delete the file resize.sh. ### Label all images Create two folders: 1 and 0 in the CigaImages folder. Manually separate the images. Move the images that have cigarette-end to folder 1 and move the images that have no-cigarette-end to folder 0. Verify that all resized cigarette-end images do contain the cigarette-end. Always good to be sure that the ground truth is correct :). Rename the images in folder 0 • Create a bash file named rename0.sh with the following script: COUNTER=1; for file in *.jpg do mv "$file" "0_$COUNTER.jpg" COUNTER=$[$COUNTER +1] done • Navigate to the CigaImages folder via the Terminal, and run the rename0.sh command to rename all images in the 0 folder: $./rename0.sh
• Delete the file rename0.sh.

Rename the images in folder 1

• Create a bash file named rename1.sh with the following script:

COUNTER=1

for file in *.jpg

do

mv "$file" "1_$COUNTER.jpg"

COUNTER=$[$COUNTER +1]

done
• Navigate to the CigaImages folder via the Terminal, and run the rename1.sh command to rename all images in the 1 folder:

$./rename1.sh • Delete the file rename1.sh. Result ## Step 3: Create csv file ### Create a text file with all image names • In the Terminal, navigate to the CigaImages folder. • Run the following command to read all files in a folder and save only the .jpg file names to a text file named files-all-jpg-001.txt. $find . -type f -name "*.jpg" > files-all-jpg.txt

The output looks like this:

• Run the following command to remove ‘./’ from files-all-jpg.txt and get a new file named files-all-jpg-001.txt.

Result

### Combine image names and labels in one file - index.csv

Combine the file files-all-jpg-001.txt and label-001.txt, so that you have both the image path and image label in one file named index.csv.

• Combine the file files-all-jpg-001.txt and label-001.txt into index.csv.

$paste -d',' files-all-jpg-001.txt label-001.txt > index.csv • Shuffle the rows in the index.csv file. $ shuf -o  index.csv  <  index.csv
$echo 'image,label' | cat - index.csv > temp && mv temp index.csv Result ## Step 4: Create the zip-file Zip the index.csv together with the two image folders (0 and 1). • Navigate to the folder CigaImages, create a new folder named ciga_0_1. $mkdir ciga_0_1
$mv 0 1 index.csv ciga_0_1/ • Zip the folder ciga_0_1 and save it as ciga_0_1.zip. $zip -r ciga_0_1.zip ciga_0_1 -x "*.DS_Store"