Comparison of Machine Learning Algorithms to Detect RPL-Based IoT Devices Vulnerability

Table of Contents

Experiments and Experiment Results

We made the datasets meaningful, and we classified as benign and malicious. Even before machine learning was performed, an equal number of rows were taken from normal and vulnerable data sets. For this, the number of rows of the data set with the least rows from the benign or malicious data sets obtained from the 60th second was taken as a basis, and the same number of rows was taken from the other data set. Thus, it is aimed to obtain a balanced amount of data. Source and destination IP addresses are extracted from the datasets before the normalization process so that the machine does not learn whether there is an attack based on the source and destination IP addresses. Subsequently, the data were normalized. The normalization process of the test and training data sets was done with equation 3.18. where z represents the normalized value, x is the unnormalized data, μ is the average of all x values, and σ is the standard deviation of x values. This has been done with the StandardScaler library in python.

z=\frac{x-\mu}{\sigma}\ \ and\ \mu=\frac{1}{N}\sum_{i=1}^{N}\left(x_i\right)\ \sigma=\sqrt{\frac{1}{N}\sum_{i=1}^{N}\left(x-\mu\right)^2}

Equation 3.18

Subsequently, the dataset was split into test and training datasets in the amount of 2/3. (2/3 training, 1/3 testing).

After this stage, the data sets will be trained and tested with different machine learning algorithms and the most appropriate machine learning algorithm to be used in the detection of Flooding, Version Number Increase and Decreased Rank Attacks will be determined in the RPL protocol. For this purpose, six types of machine learning algorithms were tested. These are the algorithms Logistic Regression Classification, Decision Trees, Random Forest, Navie Bayes, KNN Classifier and Artificial Neural Networks. The experiments were carried out with the computer with 16 GB of RAM, Intel(R) Core(TM) i7-8565U CPU 1.80GHz and 1.99 GHz , specifications. The application of machine learning algorithms was implemented with Python 3. The parameters applied to machine learning are shown in Table-3.4.

Table 3-4 Parameters Used in Machine Learning Algorithms

AlgorithmParameter
Logistic Regression

Library: sklearn.linear_model, LogisticRegression

The existing parameters of the library are used.

Random Forest

Library: sklearn.ensemble, RandomForestClassifier

Parameters: n_estimators=8, criterion=’entropy’

Decision Trees

Library: sklearn.tree, DecisionTreeClassifier

Parameters: criterion=’entropy’

Navie Bayes

Library: sklearn.naive_bayes, GaussianNB

The existing parameters of the library are used.

KNN Classifier

Library: sklearn.neighbors, KNeighborsClassifier

The existing parameters of the library are used.

Artificial Neural Networks

Library: tensorflow, keras

A 6-layer structure was applied.

Neron numbers:26,52,56,13,7,1

Optimizer:”Nadam”; Again:60; Estimate threshold:0.7

After the execution of the experiments, the accuracy rate and training time will be compared on the values.

The accuracy rate (AR) is calculated as in equation 3.19, and TP is True Positive, TN is True Negative, FP is False Positive, and FN is False Negative.

AR=\frac{TP+TN}{TP+TN+FP+FN}

Equation 3.19

After the experiments are executed, the number of rows of data sets for Flooding Attack, Version Number Increase Attack, Decreased Rank Attack, accuracy rate and training time values are shown in Table 3.5. After training the data set with machine learning algorithms, of course, the detection of the error rate will take a shorter time. However, in this thesis study, it is aimed to analyze the data sets of the same attacks with different machine learning algorithms and to determine the fastest, most effective, uncomplicated, and reliable algorithm among these algorithms. Therefore, the algorithm that performs the training in a shorter time and can detect the attack with high accuracy will be more cost-effective. The results of the attacks are interpreted below.

You can access the python code for this experiment here.

Table 3-5 Experiment Results

Attack Type

Data Set Row Count

Test and Training Data Set Row Count

Algorithm

Accuracy Rate(%)

Training Time (ms)

Flooding Attacks

(Hello Flood Attack)

B:214

M:1259

S:1473

B:214

M:214

S:428

Logistic Regression

95,7

363

Decision Trees

93,6

0

Random Forest

95,0

168

Naive Bayes

69,7

13

K Nearest Neighbor

95,7

4

Artificial Neural Networks

97,2

1847

Version Number Increase Attack

B:168

M:1114

S:1282

B:168

M:168

S:336

Logistic Regression

74,8

185

Decision Trees

74,8

2

Random Forest

72,9

253

Naive Bayes

74,8

5

K Nearest Neighbor

81,0

2

Artificial Neural Networks

72,0

1688

Decreased Rank Attack

B:160

M:151

S:311

B:151

m:151

S:302

Logistic Regression

54,0

5

Decision Trees

57,0

2

Random Forest

58,0

10

Naive Bayes

54,0

1

K Nearest Neighbor

56,0

0

Artificial Neural Networks

58,0

1720

Blog summary

I continue to share how I did my master's thesis titled Comparison of Machine Learning Algorithms for the Detection of Vulnerability of RPL-Based IoT Devices, my experiences in this process, and the codes in this thesis in a series of articles on my blog.In this article, we will train the processed data and detect the attack with machine learning algorithms in the RPL protocol.

About the Author

Other Posts

My Thesis
Murat Ugur KIRAZ

Conclusion

In this blog post, the Flooding Attack, Decreased Rank Attack and Version Number Increase Attack in the RPL protocol were trained and detected by “Decision Tree”, “Logistic Regression”, “Random Forest”, “Naive Bayes”, “K Nearest Neighbor” and “Artificial Neural Networks” algorithms.

The test results for the attacks were compared, as a result of the comparison, the Artificial Neural Networks algorithm with an accuracy rate of 97.2% in the detection of Flooding Attacks, the K Nearest Neighbor algorithm with an accuracy rate of 81% in the detection of Version Number Increase Attacks, and the Artificial Neural Networks with an accuracy rate of 58% in the detection of Decreased Rank attacks algorithm has been found to show success.

Read More »
My Thesis
Murat Ugur KIRAZ

Interpretation of Machine Learning Values

I continue to share how I did my master’s thesis titled Comparison of Machine Learning Algorithms for the Detection of Vulnerability of RPL-Based IoT Devices, my experiences in this process, and the codes in this thesis in a series of articles on my blog.

So far, I have provided detailed information about the RPL protocol and the attacks that take place in the RPL protocol. Then, I experimented with Flooding Attacks, Version Number Increased Attack, and Decreased Rank Attack, extracting the raw data and making sense of that raw data. I compared the results of experiments with weak knots with statistical methods.

In this section, I will interpret the numerical results of the attacks we detect with machine learning algorithms.

Read More »

Share this post

LinkedIn
Twitter