Methodological basis of hybrid inductive models development for big data clustering

Author(s) Collection number Pages Download abstract Download full text
Yasinska-Damri L. M., Durniak B. V., Бабічев С. А. № 2 (82) 141-150 Image Image

The paper presents the results of the research regarding the development of a me­thodological basis for creating a big data clustering hybrid inductive model. As a result of the research, the architecture and stepwise procedure conceptual description of the cluster structure formation based on the principles of the objective clustering inductive technology has been presented. The practical implementation of the proposed concept involves the use of two equivalent data subsets containing the same number of pairwise similar objects. To assess the quality of the objects distribution within the appropriate cluster structure, three types of criteria have been proposed: the internal, which determines the particularities of the objects distribution within the clusters relative to the median of the appropriate cluster; the external that is calculated as the normalized difference of the relevant internal quality criteria; the balance criterion, which contains as the components both the internal and the external criteria and it is calculated based on the application of the Harrington desirability function. The final decision concerning the cluster structure formation within the framework of the proposed concept is done on the basis of assessing the accuracy of the studied objects classification. The implementation of the proposed technique assumes possibilities of the use of various types of existing clustering algorithms. The main advantage of the proposed concept is minimizing the reproducibility error with the formation of the optimal cluster structure taking into account the goal of the solved task.

A further perspective of the author’s research is the practical implementation of the proposed technique using various current data clustering algorithms.

Keywords: clustering, inductive methods, clustering quality criteria, Harrington de­sirability function, classification accuracy.

doi: 10.32403/0554-4866-2021-2-82-141-150

  • 1. Ivahnenko, A. G. (1982). Induktivnyj metod samoorganizacii modelej slozhnyh sistem. Kiev : Naukova dumka (in Russian).
  • 2. Ivahnenko, A. G. (1987). Ob#ektivnaja klasterizacija na osnove teorii samoorganizacii mode­lej: Avtomatika, 5, 6–15 (in Russian).
  • 3. Madala, H. R., & Ivakhnenko, A. G. (1994). Inductive Learning Algorithms for Complex Systems Modeling. CRC Press (in English).
  • 4. Stepashko, V. S. (2010). Elementy teorii induktyvnoho modeliuvannia. Stan ta perspektyvy roz­vytku informatyky v Ukraini / Kolektyv avtoriv. Kyiv : Naukova dumka, 471–486 (in Uk­rainian).
  • 5. Stepashko, V. S. (2013). Samoorganizacija prognozirujushhih modelej slozhnyh processov i sistem. HV Vserossijskaja nauchno-tehnicheskaja konferencija «Nejroinformatika-2013»: Lekcii po nejroinformatike. Moskva : NIJaU MIFI, 150–170 (in Russian).
  • 6. Babichev, S., Taif, M. A., Lytvynenko, V., & Korobchinskyi, M. (2017). Objective clustering inductive technology of gene expression sequences features. Communications in Computer and Information Science. In the book «Beyond Databases, Architectures and Structures», edited by S. Kozelski and D. Mrozek, 359–372 (in English).
  • 7. Babichev, S., Gozhyj, A., Kornelyuk, A., Lytvynenko, V. (2017). Objective clustering inductive technology of gene expression profiles based on SOTA clustering algorithm. Biopolymers and Cell. Kiev : National Academy of Science Ukraine, 33 (5), 379–392 (in English).
  • 8. Babіchev, S. A. (2016). Konceptual’nye osnovy i metodologija sozdanija induktivnoj tehnologii ob#ektivnoj klasterizacii. Induktyvne modeliuvannia skladnykh system. Kyiv : Mizh­narodnyi naukovo-navchalnyi tsentr informatsiinykh tekhnolohii ta system NAN Ukrainy ta MON Ukrainy, 8, 12–32 (in Russian).
  • 9. Harrington, J. (1965). The desirability function. Industrial Quality Control, 21 (10), 494–498 (in English).