Comparison of Machine Learning Algorithms to Detect RPL-Based IoT Devices Vulnerability

Making Raw Data Meaningful

In the previous section, we simulated the flooding attack, the decreased rank attack, and the version number increase attack with contiki cooja. We had performed these simulations with both benign and malicious motes. A total of 6 raw data sets emerged from the simulations.

The raw dataset has the following columns:

No : Row number

Time : Execution time (ms)

Source : Source IP address (IPV6)

Destination: Destination IP address (IPV6)

Protocol : Protocol

Length : Packet length

Info : Bilgi (DIO, DIS, DAO, Ack messages)

Columns in raw data

The information obtained from the raw data set will not be enough to apply machine learning. The raw data obtained from simulations containing weak nodes is completely different from the raw data obtained from simulations containing normal motes. It has been observed that this difference is the number of packets, message types, total packet lengths and rates. To detect this anomaly, the raw data is divided into 1-second frames. Within frames of each second, the following values were calculated, and a new data set was created.

  • Source Mote:A unique number for each mote.
  • Destination Mote:Same number as the source mote.
  • Packet Count:The count of the whole source motes in the 1-second frame.
  • Source Mote Ratio:(Source Mote Count/Packet Count).
  • Destination Mote Ratio:(Destination Mote Count/Packet Count).
  • Source Mote Duration:The sum of all packet durations sent from source to destination in the 1-second frame.
  • Destination Mote Duration:The sum of all packet durations received by the destination in the 1-second frame.
  • Total Packet Duration:It is the sum of all packet durations in the 1-second frame.
  • Total Packet Length:It is the sum of all packet lengths in the 1-second frame.
  • Source Packet Ratio:(Sum of Source Packet lengths/ Total Packet Length).
  • Destination Packet Ratio:(Sum of Dest. Packet lengths/ Total Packet Length).
  • DIO Message Count:Count of DIO messages in the 1-second frame.
  • DIS Message Count:Count of DIS messages in the 1-second frame.
  • DAO Message Count:Count of DAO messages in the 1-second frame
  • Other Message Count:Count of the messages except for DIO, DIS, and DAO.
  • Label:0 or 1 (If the raw dataset has malicious mote/s, the label is 1 else 0).

 

The creation of the new dataset was by means of the following pseudocode.

				
					START
	Dset=INPUT(RawDataset)
	WHILE Dset Rows Ends
		Duration=time(current_row)-time(previous_row)
		Duration_list=APPEND(Duration)
	ENDWHILE
	
	Dset = Dset + Duration_list
	
	IP_dictionary={IP_Adress :unique_number}
	Crr_scnd=60
	Counter=0
	
	fs=FLOOR(Dset[Duration_list])
	
	WHILE counter < frame_second
		osf= GET(Dset[Time]>= fs and Dset[Time]<= Crr_scnd+1)
		WHILE osf Rows Ends:
			Osf_list=[ src=IP_dictionary[Source IP_Adress],
									dst=IP_dictionary[Dest. IP_Adress],
									pct_cnt=COUNT(rows)
									src_mote_rt= COUNT(src)/pct_cnt
									dst_mote_rt= COUNT(dst)/pct_cnt
									src_mote_dur=SUM(src_duration)
									dst_mote_dur= SUM(dst_duration)
									ttal_pckt_dur= SUM(duration)
									ttal_pckt_lngth= SUM(pckt_lngth)
									src_pckt_rt= SUM(src_pckt_lngth)/ ttal_pckt_lngth
									dst_pckt_rt= SUM(dst_pckt_lngth)/ ttal_pckt_lngth
									dio_msg_cnt= COUNT(dio_messages)
									dis_msg_cnt= COUNT(dis_messages)
									dao_msg_cnt= COUNT(dao_messages)
									other_msg_cnt= COUNT(other_messages)
									IF Dset=”Normal”
										Label=0
									ELSE
										Label=1
									ENDIF								
		ENDWHILE
	New_dset=APPEND(Osf_list)
	ENDWHILE
END
				
			

An example of the newly created dataset is in the table below.

Here are the python codes created to obtain the above data from the raw data.During the simulation, it was observed that a system consisting of 12 nodes fully formed the DODAG structure after the 30th second. Due to the nature of RPL, when DODAG is occurring, devices will send DIO, DAO, and DAO-ACK messages to each other, and packet traffic will be different from the traffic after DODAG occurs. To prevent this difference from being learned by the machine, the data after the 60th second of the raw data set is taken and the new data set is created.The data sets created for each attack and classified as vulnerable-normal have become ready to be compared with different machine learning algorithms. As a result, a total of 3 data sets were created: Overflow Attacks data set, Reduced Rank attacks data set and Version Number Boost Attacks data set.

Blog summary

The information obtained from the raw data set will not be enough to apply machine learning. The raw data obtained from simulations containing weak nodes is completely different from the raw data obtained from simulations containing normal motes. It has been observed that this difference is the number of packets, message types, total packet lengths and rates. To detect this anomaly, the raw data is divided into 1-second frames. Within frames of each second, the following values were calculated, and a new data set was created.

About the Author

Other Posts

My Thesis
Murat Ugur KIRAZ

Conclusion

In this blog post, the Flooding Attack, Decreased Rank Attack and Version Number Increase Attack in the RPL protocol were trained and detected by “Decision Tree”, “Logistic Regression”, “Random Forest”, “Naive Bayes”, “K Nearest Neighbor” and “Artificial Neural Networks” algorithms.

The test results for the attacks were compared, as a result of the comparison, the Artificial Neural Networks algorithm with an accuracy rate of 97.2% in the detection of Flooding Attacks, the K Nearest Neighbor algorithm with an accuracy rate of 81% in the detection of Version Number Increase Attacks, and the Artificial Neural Networks with an accuracy rate of 58% in the detection of Decreased Rank attacks algorithm has been found to show success.

Read More »
My Thesis
Murat Ugur KIRAZ

Interpretation of Machine Learning Values

I continue to share how I did my master’s thesis titled Comparison of Machine Learning Algorithms for the Detection of Vulnerability of RPL-Based IoT Devices, my experiences in this process, and the codes in this thesis in a series of articles on my blog.

So far, I have provided detailed information about the RPL protocol and the attacks that take place in the RPL protocol. Then, I experimented with Flooding Attacks, Version Number Increased Attack, and Decreased Rank Attack, extracting the raw data and making sense of that raw data. I compared the results of experiments with weak knots with statistical methods.

In this section, I will interpret the numerical results of the attacks we detect with machine learning algorithms.

Read More »

Share this post

LinkedIn
Twitter