Confusion Matrix and Cyber Attacks

Nishant Singh
5 min readJun 6, 2021

Hello everyone , today we are going through the confusion matrix, its importance and how it is utilized in the industry by solving some real world cyber crime related attacks. So starting with the confusion matrix.

What is Confusion Matrix ?

In machine learning world, we are much focused on data preprocessing and Feature Engineering for doing proper operations with features as variables.Once data is preprocessed the most important step is Feature Selection after which we finally fit our Dataset and based upon regression or classification problems, we create our models.

Confusion Matrix is one of the method used in classification problems that makes use of a table which can help us in calculating the accuracy of a prediction model.

In binary classification , we have four different outputs as the prediction of model, they are as follows:

True Positive: It means that the value predicted to be true and the value is true in a positive way

False Positive: It means that the value predicted to be false and its value comes in a positive way. It comes under Type 1 error.

False Negative: It means that the value is predicted to be false and Its value comes in a negative way. It comes under Type 2 error.

True Negative: It means that the value is predicted to be true and the value is true in a negative way.

Using all of these values , we try to calculate the Accuracy of model by the following equation:

Accuracy=(TN+TP)/(TN+FP+FN+TP)

I have a very great example that explains how confusing matrix so have a look at the following picture.It is self explanatory.

True Positive: He is predicted to be sleeping and is actually sleeping

False Positive: He is predicted to be sleeping and its false.

False Negative: He is predicted to be not sleeping and its false.

True Negative: He is predicted to be not sleeping its true.

Considering the picture to be split as the four sections of the graph , II quadrant represents TP, I quadrant represents FP, III quadrant represents FN and IV quadrant represents TN.

Conclusions

  • TP and TN both are the relevant features with gives use true values
  • Here among the two types of error the most dangerous one is False Negative.Its equivalent to saying a person having tested to be negative for COVID test by was actually suffering from COVID.This is why we must have as much minimum possible for the Type 2 error.

What are Cyber Attacks?

Cyber Attacks are the attacks from the public internet maybe in the form of virus , bank frauds, Data stealing and many more things that affects the bearer in some way financially or mentally .

Examples of Attack:

  • Malware — Malware breaches a network through a vulnerability, typically when a user clicks a dangerous link or email attachment that then installs risky software
  • Phishing — Phishing is the practice of sending fraudulent communications that appear to come from a reputable source, usually through email.
  • Man in the Middle — Man-in-the-middle (MitM) attacks, also known as eavesdropping attacks, occur when attackers insert themselves into a two-party transaction.
  • Denial of service — A denial-of-service attack floods systems, servers, or networks with traffic to exhaust resources and bandwidth.
  • DNS Tunneling — It can also be used for command and control callbacks from the attacker’s infrastructure to a compromised system.

The above graph explains how much necessary it is to bycott these attacks and make internet safe and away from hackers.

For the proper definitions of above I have referred to the cisco site, in order to get more descriptive knowledge you may refer to the following link:

Cyber Attacks and Machine Learning

Machine learning is emerging in every field involving computer. Its very common now that we make use of machine learning to make proper data analysis.

Cyber crimes e.g. DOS attack follows some particular algorithm that can be detected be our AI algorithm tracker preventing hackers to access a set of files.

Computer are given Artificial brain by us to make them more intellingent, interactive and smart enough to judge what is wrong or right and event detection like cyber crime ,face , Language and many more things that can be predicted.In fact they are being used up for performing these cyber crimes as well.

Confusion matrix in Binary Classification

Considering Binary classification models, we can have only two values either TRUE or FALSE, here we can have positive and negative weights as well whose consideration is equally important .

Following is a confusion matrix for all the attacks hat comes under R2L attacks among DOS, R2L, U2R and PRB attacks:

The above confusion matrix represents:

  • TP: There are in total 1092 R2L attacks found that are really R2L attacks.
  • FN: There are in total 35 Non-R2L attacks found that are not really R2L attacks.
  • TN: There are in total 1091 Non-R2L attacks that are really R2L attacks.
  • FP: There are in total 34 R2L attacks found and that are not really R2L attacks.

Hope you have got a proper understanding of confusion matrix and its importance.Thank you! for joining me 😀

--

--

Nishant Singh

I’m a student learning some newer technologies day by day . I just wish to go contribute somehow in this changing and evolving world😊