Comparison of Machine Learning Algorithms to Detect RPL-Based IoT Devices Vulnerability

Table of Contents

Interpretation of Machine Learning Values

Interpretation of Values of Flooding Attack

The data set created for the flooding attack, or “Hello Flood” attack, is a total of 428 rows. When the same data set is trained with different machine learning algorithms and the same values are tested, the algorithm with the highest accuracy rate is the Deep Learning algorithm with an accuracy rate of 97.2%. This algorithm is followed by Logistic Regression and K Nearest Neighbor with an accuracy rate of 95.7%, Random Forest with an accuracy rate of 95%, Decision Trees with an accuracy rate of 93.6%, and Naive Bayes with an accuracy rate of 69.7%. Given the training times, the Decision Trees classification algorithm operated in under 1 millisecond. This algorithm is followed by K Nearest Neighbor with 4 milliseconds, Random Forest with 168 milliseconds, Logistic Regression with 363 milliseconds, and Artificial Neural Networks with 1847 milliseconds, respectively.

In flooding attacks, network packets exhibit abnormal behavior. Overflow attacks generate large amounts of traffic and make nodes and connections unusable. This attack occurs when a vulnerable node sends unnecessary DIS messages. This difference is seen in figure 3.10. The DIS rate on packets of an attacked network differs from the DIS rates on a normal network. Therefore, this attack could be easily detected. In addition, the anomaly is noticed in the comparison of statistical methods.

Figure 3.10: Simulation with Benign Motes and Malicious Motes.

The artificial neural network is an advanced algorithm. This algorithm detected the anomaly in the data set we have created for Flooding Attacks with high accuracy.

If the DIS rates differ because it directly affects whether the result is or not, it can be expressed in a linear equation. Therefore, the Logistic Regression algorithm gave a result of over 95%.

In flooding attacks, the DIS rates of “categorical” data in the data packets differ, which makes it easy for Decision Trees algorithms to detect this attack. Because the “information gain” in DIS rates is higher than other inputs. The short duration of training is because it works quickly.

The reason why the KNN algorithm can detect the attack with high accuracy is again due to the anomaly of the DIS messages.

Interpretation of the Values of the Version Number Increase Attack

The data set generated for the Version Number Increase Attack is a total of 336 rows. When the same data set is trained with different machine learning algorithms and the same values are put to the test, the algorithm with the highest accuracy rate is the K Nearest Neighbor algorithm with an accuracy rate of 81 %. This algorithm is followed by Logistic Regression with an accuracy rate of 74.8%, Decision Trees with an accuracy rate of 74.8, Naive Bayes with an accuracy rate of  74.8, Random Forest with an accuracy rate of  72.9%, and Artificial Neural Networks with an accuracy rate of 72%. Considering the training duration, with 2 milliseconds,  the KNN and Decision Trees machine learning algorithms were trained in the shortest time. This algorithm is followed by Naive Bayes with 5 milliseconds, Logistic Regression with 185 milliseconds, Random Forest with 253 milliseconds, and Artificial Neural Networks with 1688 milliseconds.

The Version Number Increase Attack occurs when a vulnerable node changes the version number and forwards it to its neighbors. When this happens, the other nodes realize that the normal DODAG structure has changed and the DODAG is asked to be rebuilt. Successive unnecessary reconfiguration of DODAG significantly increases message overhead, consumes node resources, and clogs up the network. Figures of the Version Number Increase Attack are included in Figure-3.11. Especially in the experiment with 22 nodes, it is seen in Figure-3.11 that the DODAG structure does not occur.

Each time DODAG reoccurs, DAO, DIA, and DAO-ACK messages and data packet lengths increase.

The reason why the KNN algorithm can detect the attack with high accuracy is again due to the anomaly of DAO, DIA, and DAO-ACK messages and data packet lengths.

The Logistic Regression algorithm was also able to detect the attack because the DAO, DIA, and DAO-ACK messages and data packet lengths differed according to the values obtained from the simulation with normal nodes, again making it easier to express them in a linear equation.

Artificial neural networks are advanced algorithms. This algorithm can detect the anomaly in the data set we have created for this attack with a high accuracy rate.

Figure 3.11: Version Number Attack Images with 11 and 23 Motes.

Interpretation of Values of a Decreased Rank Attack

The generated dataset for the Decreased Rank Attack is a total of 302 rows. When the same data set is trained with different machine learning algorithms and the same values are tested, the algorithm with the highest accuracy rate is the Random Forest and Artificial Neural Networks algorithm with an accuracy rate of 58 %. This algorithm is followed by Decision Trees with 57% accuracy rate, K Nearest Neighbor with 56% accuracy rate, Logistics Regression, and Naive Bayes algorithms with 54% accuracy rate. Given their training time, the K Nearest Neighbor machine learning algorithm is trained in under 1 millisecond. This algorithm is followed by Naive Bayes with 1 millisecond, Decision Trees with 2 milliseconds, Logistic Regression with 5 milliseconds, Random Forest with 10 milliseconds, and Artificial Neural Networks with 1720 milliseconds.

Compared to other attacks, the detection rate of a Decreased Rank attack is quite low. This is because the intent of this attack is not to interfere with service or disrupt system operation. A decreased rank attack is one of those attacks that invade privacy. In this attack, when the vulnerable node announces an abnormally lower rank, many normal nodes are connected to the DODAG graph through the attacker. Since this attack does not harm the network, it is more difficult to detect than others because normal and vulnerable data packets show similar characteristics.

Blog summary

I continue to share how I did my master's thesis titled Comparison of Machine Learning Algorithms for the Detection of Vulnerability of RPL-Based IoT Devices, my experiences in this process, and the codes in this thesis in a series of articles on my blog.So far, I have provided detailed information about the RPL protocol and the attacks that take place in the RPL protocol. Then, I experimented with Flooding Attacks, Version Number Increased Attack, and Decreased Rank Attack, extracting the raw data and making sense of that raw data. I compared the results of experiments with weak knots with statistical methods.In this section, I will interpret the numerical results of the attacks we detect with machine learning algorithms.

About the Author

Other Posts

My Thesis
Murat Ugur KIRAZ

Conclusion

In this blog post, the Flooding Attack, Decreased Rank Attack and Version Number Increase Attack in the RPL protocol were trained and detected by “Decision Tree”, “Logistic Regression”, “Random Forest”, “Naive Bayes”, “K Nearest Neighbor” and “Artificial Neural Networks” algorithms.

The test results for the attacks were compared, as a result of the comparison, the Artificial Neural Networks algorithm with an accuracy rate of 97.2% in the detection of Flooding Attacks, the K Nearest Neighbor algorithm with an accuracy rate of 81% in the detection of Version Number Increase Attacks, and the Artificial Neural Networks with an accuracy rate of 58% in the detection of Decreased Rank attacks algorithm has been found to show success.

Read More »
My Thesis
Murat Ugur KIRAZ

Interpretation of Machine Learning Values

I continue to share how I did my master’s thesis titled Comparison of Machine Learning Algorithms for the Detection of Vulnerability of RPL-Based IoT Devices, my experiences in this process, and the codes in this thesis in a series of articles on my blog.

So far, I have provided detailed information about the RPL protocol and the attacks that take place in the RPL protocol. Then, I experimented with Flooding Attacks, Version Number Increased Attack, and Decreased Rank Attack, extracting the raw data and making sense of that raw data. I compared the results of experiments with weak knots with statistical methods.

In this section, I will interpret the numerical results of the attacks we detect with machine learning algorithms.

Read More »

Share this post

LinkedIn
Twitter