Aggregation bias

Aggregation bias is when the data is aggregated and it leads to a bias. When you combine two different datasets that shouldn’t be combined together.

Example: When you combine datasets of women and men who have diabetes. Those two datasets should not be combined together since they have different effects on different genders.

Example: The more years pass, the more income someone earns. However, for the dancers, it might be the opposite. In this case, you cannot have an aggregated income graph.

Income change over the years

How to prevent aggregation bias

You should get an understanding of your dataset, and the effects of combining two datasets together. You can simply look at the relationship between different features and check how combining these features affect the outcome.

For instance, in the example of people who have diabetes, you should look how adding women and men data together affect the model outcomes.

Was this page helpful?