Micro F1-score

Micro F1-score (short for micro-averaged F1 score) is used to assess the quality of multi-label binary problems.
It measures the F1-score of the aggregated contributions of all classes.

If you are looking to select a model based on a balance between precision and recall, don’t miss out on assessing your F1-scores!

Micro F1-score = 1 is the best value (perfect micro-precision and micro-recall), and the worst value is 0. Note that precision and recall have the same relative contribution to the F1-score.

Emphasis on common labels
Micro-averaging will put more emphasis on the common labels in the data set since it gives each sample the same importance. This may be the preferred behavior for multi-label classification problems. Labels that are very rare in the dataset, e.g., a genre that only represents 0.01% of the data examples, may not be intended to influence the overall F1-score heavily if the model is performing well on the other more common genres.


Micro F1-score is defined as the harmonic mean of the precision and recall:

\[\begin{array}{rcl} \text{Micro F1-score} & = & 2 * \dfrac{\text{Micro-precision * Micro-recall}}{\text{Micro-precision + Micro-recall}} \\ \end{array}\]


Micro-averaging is used when a problem has 2 or more labels that can be true, for example, in our tutorial Build your own music critic.

Micro-averaging F1-score is performed by first calculating the sum of all true positives, false positives, and false negatives over all the labels. Then we compute the micro-precision and micro-recall from the sums.
And finally we compute the harmonic mean to get the micro F1-score.

Micro-averaged values can be high even if the model is performing very poorly on a rare label since it gives more weight to the common labels.

Was this page helpful?