Imbalanced data

Imbalanced data is when the distribution of samples in data is uneven when there is less data for a specific group.

Imbalanced data is not necessarily wrong to have but it is important to understand how to tackle it and when it might cause bias. Since the algorithms will learn from the majority of the data, we should make sure to understand the outcomes of the model on minority classes.

Example of imbalanced data

Example: When conducting a survey about diversity in management level where only 10% of the managers are women.

