A *confusion matrix* (also known as a prediction-accuracy table) is a contingency table between two variables that are meant to measure the same thing. Typically, between the observed and predicted values of an outcome variable from a predictive model.

For a categorical outcome variable each row of the table counts the number of cases with a specific observed class and each column counts the cases with a specific predicted class. Thus each cell counts the cases with a certain observed class and with a predicted class that many be the same (along the diagonal of the table) or different (off the diagonal of the table). If there are many cases along the diagonal, as in the example below, then the predictions of the model are strongly aligned with the observed values.

When hovering over a cell in the matrix the number of cases in that cell is displayed along with percentages of all cases, the cases with the same observed class and the cases with the same predicted class.

If an outcome variable is numeric and a count (non-negative integers) it is treated as a categorical variable as described above. If the predicted values are not integers then they are mapped to the nearest integer.

If an outcome variable is numeric but not a count, the observed and predicted values are bucketed and the cells indicate the number of cases in a specific pair of observed and predicted buckets. This is shown in the example below. The row and column labels are the upper values of each bucket range.

In the case of a numeric outcome variable the proximity of an instance to the diagonal indicates the quality of the prediction. However for a categorical outcome variable with arbitrary ordering of the categories it is only relevant whether a case is on the diagonal or not.

## Comments

0 comments

Article is closed for comments.