Photo by Jon Tyson on Unsplash

Diving into the Confusion Matrix :DS-ML-1

rajath cs
3 min readSep 3, 2020

--

Have you been in a situation where you are unable to decide by looking at a thing? All of us have. The world of data science is also the same, neither the developer nor the model will be able to come to a conclusion on the end result. Already confusing right…???Welcome to the world of Confusion Matrix.

Introduction

Like addressed in sub-title section the huge datasets with various features (columns even after Dimensionality Reduction) only leads to more confusions (technically the measure of precision and recall) for the required outcome. I hope you are getting the point now, thus the final outcome is a classification result of deducing a matrix.

Let’s say you have a matrix of pixels from an image and your main aim is the detect red colored objects only in the given input image. Now if your aim is to deduce the matrix to contain only those pixels which represent red color then you cannot blindly consider the output matrix to contain all red colored pixels only. You’d need to measure what are the many aspects of red pixels in the image, Like:

  • Measure the actual number of red pixels to predicted number of red pixels.
  • Understand how many red pixels were not captured or captured as non-red
  • If the image had any red pixels at all and see if the output matrix still had any values for the same.
  • The image does not have red objects and subsequently neither does the output matrix have any red pixel values.

The above measures points are places in a new matrix which represents the values of True Positive (TP), True Negative (TN), False Positive(FP) and False Negative(FN) respectively. To do this is nothing but to calculate precision and recall by harmonic mean method using the values of TP, TN, FP and FN.

A confusion matrix is an actual cross predicted (like we pronounce m cross n (mxn)) matrix i.e. a matrix with predicted outcome as rows and actual outcome as columns.

Image Source: https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html

Summary

True Positive: You have contracted SARS-COV-19 and tested positive for it.

True Negative: You have not contracted SARS-COV-19 and tested negative for it.

False Positive: You have not contracted SARS-COV-19 and yet tested positive for it.

False Negative: You have contracted SARS-COV-19 and tested neagtive for it.

Conclusion

In my next article, I will describe confusion matrix with sample code. Keep Waiting!!! Keep Data Sciencing!!!

One of the primary reason for describing these terminologies is that they are directly related to the other core Data Science Jargons like: Precision, Recall, F-Score which I will cover in my upcoming blog.

--

--