Export Tables from Access to PostgreSQL
This article describes how to export a table from Access DB to postgresql via excel and python.
We made the datasets meaningful, and we classified as benign and malicious. Even before machine learning was performed, an equal number of rows were taken from normal and vulnerable data sets. For this, the number of rows of the data set with the least rows from the benign or malicious data sets obtained from the 60th second was taken as a basis, and the same number of rows was taken from the other data set. Thus, it is aimed to obtain a balanced amount of data. Source and destination IP addresses are extracted from the datasets before the normalization process so that the machine does not learn whether there is an attack based on the source and destination IP addresses. Subsequently, the data were normalized. The normalization process of the test and training data sets was done with equation 3.18. where z represents the normalized value, x is the unnormalized data, μ is the average of all x values, and σ is the standard deviation of x values. This has been done with the StandardScaler library in python.
z=\frac{x-\mu}{\sigma}\ \ and\ \mu=\frac{1}{N}\sum_{i=1}^{N}\left(x_i\right)\ \sigma=\sqrt{\frac{1}{N}\sum_{i=1}^{N}\left(x-\mu\right)^2}Equation 3.18
Subsequently, the dataset was split into test and training datasets in the amount of 2/3. (2/3 training, 1/3 testing).
After this stage, the data sets will be trained and tested with different machine learning algorithms and the most appropriate machine learning algorithm to be used in the detection of Flooding, Version Number Increase and Decreased Rank Attacks will be determined in the RPL protocol. For this purpose, six types of machine learning algorithms were tested. These are the algorithms Logistic Regression Classification, Decision Trees, Random Forest, Navie Bayes, KNN Classifier and Artificial Neural Networks. The experiments were carried out with the computer with 16 GB of RAM, Intel(R) Core(TM) i7-8565U CPU 1.80GHz and 1.99 GHz , specifications. The application of machine learning algorithms was implemented with Python 3. The parameters applied to machine learning are shown in Table-3.4.
Algorithm | Parameter |
Logistic Regression | Library: sklearn.linear_model, LogisticRegression The existing parameters of the library are used. |
Random Forest | Library: sklearn.ensemble, RandomForestClassifier Parameters: n_estimators=8, criterion=’entropy’ |
Decision Trees | Library: sklearn.tree, DecisionTreeClassifier Parameters: criterion=’entropy’ |
Navie Bayes | Library: sklearn.naive_bayes, GaussianNB The existing parameters of the library are used. |
KNN Classifier | Library: sklearn.neighbors, KNeighborsClassifier The existing parameters of the library are used. |
Artificial Neural Networks | Library: tensorflow, keras A 6-layer structure was applied. Neron numbers:26,52,56,13,7,1 Optimizer:”Nadam”; Again:60; Estimate threshold:0.7 |
After the execution of the experiments, the accuracy rate and training time will be compared on the values.
The accuracy rate (AR) is calculated as in equation 3.19, and TP is True Positive, TN is True Negative, FP is False Positive, and FN is False Negative.
AR=\frac{TP+TN}{TP+TN+FP+FN}Equation 3.19
After the experiments are executed, the number of rows of data sets for Flooding Attack, Version Number Increase Attack, Decreased Rank Attack, accuracy rate and training time values are shown in Table 3.5. After training the data set with machine learning algorithms, of course, the detection of the error rate will take a shorter time. However, in this thesis study, it is aimed to analyze the data sets of the same attacks with different machine learning algorithms and to determine the fastest, most effective, uncomplicated, and reliable algorithm among these algorithms. Therefore, the algorithm that performs the training in a shorter time and can detect the attack with high accuracy will be more cost-effective. The results of the attacks are interpreted below.
You can access the python code for this experiment here.
Attack Type | Data Set Row Count | Test and Training Data Set Row Count | Algorithm | Accuracy Rate(%) | Training Time (ms) |
Flooding Attacks (Hello Flood Attack) | B:214 M:1259 S:1473 | B:214 M:214 S:428 | Logistic Regression | 95,7 | 363 |
Decision Trees | 93,6 | 0 | |||
Random Forest | 95,0 | 168 | |||
Naive Bayes | 69,7 | 13 | |||
K Nearest Neighbor | 95,7 | 4 | |||
Artificial Neural Networks | 97,2 | 1847 | |||
Version Number Increase Attack | B:168 M:1114 S:1282 | B:168 M:168 S:336 | Logistic Regression | 74,8 | 185 |
Decision Trees | 74,8 | 2 | |||
Random Forest | 72,9 | 253 | |||
Naive Bayes | 74,8 | 5 | |||
K Nearest Neighbor | 81,0 | 2 | |||
Artificial Neural Networks | 72,0 | 1688 | |||
Decreased Rank Attack | B:160 M:151 S:311 | B:151 m:151 S:302 | Logistic Regression | 54,0 | 5 |
Decision Trees | 57,0 | 2 | |||
Random Forest | 58,0 | 10 | |||
Naive Bayes | 54,0 | 1 | |||
K Nearest Neighbor | 56,0 | 0 | |||
Artificial Neural Networks | 58,0 | 1720 |
This article describes how to export a table from Access DB to postgresql via excel and python.
In this blog post, the Flooding Attack, Decreased Rank Attack and Version Number Increase Attack in the RPL protocol were trained and detected by “Decision Tree”, “Logistic Regression”, “Random Forest”, “Naive Bayes”, “K Nearest Neighbor” and “Artificial Neural Networks” algorithms.
The test results for the attacks were compared, as a result of the comparison, the Artificial Neural Networks algorithm with an accuracy rate of 97.2% in the detection of Flooding Attacks, the K Nearest Neighbor algorithm with an accuracy rate of 81% in the detection of Version Number Increase Attacks, and the Artificial Neural Networks with an accuracy rate of 58% in the detection of Decreased Rank attacks algorithm has been found to show success.
I continue to share how I did my master’s thesis titled Comparison of Machine Learning Algorithms for the Detection of Vulnerability of RPL-Based IoT Devices, my experiences in this process, and the codes in this thesis in a series of articles on my blog.
So far, I have provided detailed information about the RPL protocol and the attacks that take place in the RPL protocol. Then, I experimented with Flooding Attacks, Version Number Increased Attack, and Decreased Rank Attack, extracting the raw data and making sense of that raw data. I compared the results of experiments with weak knots with statistical methods.
In this section, I will interpret the numerical results of the attacks we detect with machine learning algorithms.