Reporting bias occurs when the frequency of events in a dataset does not reflect the frequency of events in real life.
Example: When users don’t have polarized opinions in a user survey, they might tend to skip the survey questions. In a sentiment analysis model, this might cause the model to predict either extreme positive or extreme negative results, even though the majority has neutral opinions in real life.
How to prevent reporting bias?
To prevent reporting bias, you can ask yourself:
Who will the model impact and who is represented and underrepresented by the data?
Does the data reflect reality?
Does the model perform equally well across different groups?