Author(s) | Collection number | Pages | Download abstract | Download full text |
---|---|---|---|---|
Yasinska-Damri L. M., Durniak B. V., Бабічев С. А. | № 2 (82) | 141-150 |
The paper presents the results of the research regarding the development of a methodological basis for creating a big data clustering hybrid inductive model. As a result of the research, the architecture and stepwise procedure conceptual description of the cluster structure formation based on the principles of the objective clustering inductive technology has been presented. The practical implementation of the proposed concept involves the use of two equivalent data subsets containing the same number of pairwise similar objects. To assess the quality of the objects distribution within the appropriate cluster structure, three types of criteria have been proposed: the internal, which determines the particularities of the objects distribution within the clusters relative to the median of the appropriate cluster; the external that is calculated as the normalized difference of the relevant internal quality criteria; the balance criterion, which contains as the components both the internal and the external criteria and it is calculated based on the application of the Harrington desirability function. The final decision concerning the cluster structure formation within the framework of the proposed concept is done on the basis of assessing the accuracy of the studied objects classification. The implementation of the proposed technique assumes possibilities of the use of various types of existing clustering algorithms. The main advantage of the proposed concept is minimizing the reproducibility error with the formation of the optimal cluster structure taking into account the goal of the solved task.
A further perspective of the author’s research is the practical implementation of the proposed technique using various current data clustering algorithms.
Keywords: clustering, inductive methods, clustering quality criteria, Harrington desirability function, classification accuracy.
doi: 10.32403/0554-4866-2021-2-82-141-150