# Introduction

This article will discuss how the Deep View Validator detection metrics are calculated. The metrics that will be discussed are the mean average metrics (mAP, mAR, mACC), the overall metrics (Overall precision, recall, and accuracy), and the timings. Both mean average metrics and overall metrics rely on the fundamental equations for precision, recall, and accuracy.

# Prerequisites

- Familiarity with the definitions of true positives, false positives, and false negatives from Deep View Validator Detection Classifications

# Precision, Recall, and Accuracy

This section will show the basic equations used to determine the precision, recall, and accuracy. These equations are fundamental in calculating the mean average metrics and the overall metrics which are reported in the Deep View Validator validation reports.

Consider the following image:

In this image there are two true positives, one false negative, and one classification false positive.

Precision: Of all the predictions, what fraction was correct.

Recall: Of all the ground truth, what fraction was correct.

Accuracy: Of all the model predictions and ground truth, what fraction was correct.

A very precise model is a model that can make correct predictions. A model with a high recall is

model that can detect all the ground truth labels.

There is a tradeoff between precision and recall. Increasing precision will reduce recall and vice

versa. This is because increasing precision means decreasing the number of false positives but

allows for missed predictions, thus increasing the number of false negatives.

# Overall Metrics

This section will show how the overall metrics (Overall Precision, Overall Recall, Overall Accuracy) are calculated. The overall metrics are based on calculating the precision, recall, and accuracy regardless of the class. In other words, these metrics only care about the total true positives, false positives, and false negatives gathered and using the equations for precision, recall, and accuracy to calculate the overall metrics.

In this image there are two true positives, one false negative, and one classification false positive.

Overall Precision = 2/3 = 66.7% |

Overall Recall = 2/4 = 50% |

Overall Accuracy = 2/4 = 50% |

The accuracy metric provides a better metric in comparison to precision and recall. Precision only considers the detections that is a ratio of the number of correct detections to the total number of detections. In other words, out of all the detections how many were correctly detected. It is not fair to conclude that the model is performing well by looking at precision alone because consider a case where the model might have made 9 detections which are all correct and yields a precision of 100%, but there are 200 ground truth annotations, the model missed the rest of the 191 annotations which yields a recall of 4.5%.

A similar idea is presented for recall, this metric only considers the ratio of correct detections against the total number of ground truths. In other words, out of all the ground truth annotations; how many were correctly found. Recall only considers correct detections, but it will never consider the false localization detections. It is possible that the model will correctly find all ground truth annotations, but it might have generated large amounts of localization false positives.

The accuracy metric aims to combine both precision and recall by considering correct detections (tp), false detections (localization fp and classification fp), and missed detections (fn). The accuracy is the ratio of the correct detections against all model detections and all ground truth annotations. This metric aims to measure how well the model aligns its detections to the ground truth and a good alignment suggests no missed annotation and no false detections.

# Mean Average Metrics

This section will show how the mean average metrics (mAP, mAR, mACC) are calculated. The mean average metrics are based on calculating the sum of the precision, recall, and accuracy per class and dividing by the number of classes.

Consider the following image again.

*Note: the classes to be considered are 'three' and 'ace'.*

For the class 'three' there are two true positives and two false negatives. For the class 'ace' there is one false positive.

*Note: The average metrics are denoted by 0.5, 0.75 or 0.5-0.95. For example, mAP 0.5, mAP 0.75, mAP 0.5-0.95. *

- For mAP 0.5, only consider the predictions whose IoU are larger than or equal to 0.5. Since all model predictions have IoU larger than 0.5, then we consider all predictions in the calculation.
- For mAP 0.75, only consider the predictions whose IoU are larger than or equal to 0.75.
- For mAP 0.5-0.95, calculate the sum of mAP 0.5, mAP 0.55, ... mAP 0.95 and then divide by 10.

The same concept is applied for mAR and mACC.

### mAP

This metric is the mean of the average precision values of each class at IoU thresholds 0.50-0.95. The average precision values are calculated by finding the area under the precision recall curve. For each class, there exists average precision values from 0.50-0.95 in 0.05 steps. The mAP 0.50-0.95 is then the average of the mAP values across the thresholds.

### mAR

mAR = (1/2) [Recall (three) + Recall (ace)] | (1/2) [2/4 + undefined] | 25% |

*Note: Undefined occurs when a division by zero occurs, in this case, it is considered with a value zero.*

### mACC

mACC = (1/2) [Accuracy (three) + Accuracy (ace)] | (1/2) [2/4 + 0/1] | 25% |

# False Positive Ratios

As mentioned in the previous article, there are two categories to false positives: classification and localization. Classification false positives are matched to a ground truth, but the model prediction label and the ground truth label are not matching. Localization false positives are arbitrary model predictions on an image that do not correlate to any ground truth.

## Localization False Positive Error

Is the ratio of the localization false positives over the total false positives.

## Classification False Positive Error

Is the ratio of the classification false positives over the total false positives.

We are able to extract information about the model's behavior from these errors. We can see where the model makes the most errors, either through misidentification of labels (classification) or misidentification for objects (localization).

# Timings

The timings represent how fast a model processes a given image in milliseconds:

• Load Time: How long it takes to load an image from file.

• Inference Time: How long it takes for ModelRunner to detect objects in an image.

• Box (Decode) Time: How long it takes to draw bounding boxes on an image.

Mean Load Time is the sum of all load times in all images divide by the total number of images.

Mean Inference Time is the sum of all inference times in all images divide by the total number of

images.

Mean Box Time is the sum of all box times in all images divide by the total number of images.

# Conclusion

This article has shown how to calculate the metrics reported in Deep View Validator. The basic equations for precision, recall, and accuracy were shown and using these fundamental equations, the article described how to calculate the mean average metrics, the overall metrics, false positive error ratios, and the mean timings.

For a much more in-depth example showing the metric computations on a small dataset, visit this link.

## Comments

0 comments

Please sign in to leave a comment.