Author(s) | Collection number | Pages | Download abstract | Download full text |
---|---|---|---|---|
Yasinska-Damri L. M., Durniak B. V. | № 1 (81) | 35-44 |
The paper presents a comparative analysis of various types of normalization techniques. The accuracy of data classification which was carried out after data normalizing was used as the main criterion for evaluating the quality of the appropriate normalizing method. Four various types of datasets downloaded from the UCI Machine Learning Repository were used as the experimental data during the simulation process. Various normalization techniques available from package clusterSim of R software were applied to the experimental data. The quality of the data normalizing procedure was evaluated based on the use of data classification by the calculation of the accuracy of the distribution of the objects into classes. The neural network multilayer perceptron was used as the classifier at this step.
Four different types of datasets were used as the experimental data during the simulation procedure: Iris Plants, Seeds, Wine and Glass. The simulation results have shown that the data normalizing stage significantly influences the classification accuracy and selection of the normalization method depends on the type of data and, consequently, the selection of the normalizing technique should be carried out in each of the cases separately.
The analysis of the obtained results allows also concluding that the normalization methods that correspond to maximum value of the classification accuracy are different for various datasets. So, the normalization methods n1, n6a, n8, n9 and n11 are the optimal ones for the iris dataset. In this case, 100% classification accuracy is obtained for test dataset. The normalizing technique n11 is optimal one for the seeds data. The highest (almost maximal) classification accuracy was received in this case. The methods n3a and n5a are optimal ones for complex wine data. In the case of the glass dataset use, the n1 and n5 normalization methods are optimal ones.
Keywords: data normalization, classification quality criteria, multilayer perceptron, data processing, classification accuracy.
doi: 10.32403/0554-4866-2021-1-81-35-44