Summer Task-5 :

Use cases confusion matrix or its two types of error in Cyber Security World

Confusion matrix is a fairly common term when it comes to machine learning. Today I will explain the importance of confusion matrix when considering the cyber crimes.

So confusion matrix is yet another classification metric that can be used to tell how good our model is performing. Yet it is more often used in various places which might not be using the confusion matrix.

This all gives us an idea that there is something more to confusion matrix than just being another classification metric.

What is Confusion Matrix?

It is a performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

True Positive:

Interpretation: You predicted positive and it’s true.

True Negative:

Interpretation: You predicted negative and it’s true.

False Positive: (Type 1 Error)

Interpretation: You predicted positive and it’s false.

False Negative: (Type 2 Error)

Interpretation: You predicted negative and it’s false.

The most dangerous error is the False Positive [FP] error as the machine predicted false but it was not false it was true. For example, the machine predicted student fails but actually student was a pass.

This error causes problems in the cybersecurity world where the tools used are based on machine learning or ai, it may give a False Negative error that may cause dangerous impacts.

Therefore the role of the confusion matrix is important in the field of machine learning.

Cybercrime can be anything like:

  • Stealing of personal data
  • Identity stolen
  • For stealing organizational data
  • Steal bank card details.
  • Hack emails for gaining information.

Confusion Matrix’s implementation in monitoring Cyber Attacks:

The data set was used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between ``bad’’ connections, called intrusions or attacks, and ``good’’ normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.

In the KDD99 dataset these four attack classes (DoS, U2R, R2L, and probe) are divided into 22 different attack classes that tabulated below:

In the KDD Cup 99, the criteria used for evaluation of the participant entries is the Cost Per Test
(CPT) computed using the confusion matrix and a given cost matrix.
• True Positive (TP): The amount of attack detected when it is actually attacked.
• True Negative (TN): The amount of normal detected when it is actually normal.
• False Positive (FP): The amount of attack detected when it is actually normal (False alarm).
• False Negative (FN): The amount of normal detected when it is actually attacked.

Trade off between type 1 and type 2 error is very critical in cyber security. Let’s take another example. Consider a face recognition system which is installed in front of the data warehouse which holds critical error. Consider that the manager comes and the recognition system is unable to recognize him. He tries to log in again and is allowed in.

This seems a pretty normal scenario. But let’s consider another condition. A new person comes and tries to log himself in. The recognition system makes and error and allows him in. Now this is very dangerous. An unauthorized person has made an entry. This could be very damaging to the whole company.

In both the cases there was an error made by the security system. But the tolerance for False Negative here is 0 although we can still bear False Positive.

This shows the critical nature that might vary from use case to use case where we want a tradeoff between the two types of error.

Thank You !!!