Volume 20 No 8 (2022)
Download PDF
A NOVEL PERFORMANCE ENHANCED AND OUTLIER RESISTANT HYBRIDIZED GINI_HDBSCAN DEEP CLUSTERING ALGORITHM FOR BIG DATA ANALYSIS
N.Valarmathy. , Dr.Krishnaveni Sakkarapan
Abstract
HDBSCAN is a unique and most prominent density-based clustering algorithm in which is it possible
to construct hierarchy trees and extract flat clusters from that tree using specific stability
measures. Predominantly most of the hierarchical clustering algorithms used nowadays have a huge
number of computations in obtaining pairwise dissimilarity measures. Such limitations can be
overcome using a clustering algorithm that makes use of a single linkage concept and faces many
problems like it is very much prone to outliers and can produce extremely skewed or slanted
dendrograms. To overcome the limitations a hierarchical clustering linkage criterion commonly
known as Genie is being used which can link two clusters with a chosen inequity measure (Gini Index
or Bonferroni Index) so that the size of the cluster will not go more than the assigned threshold value.
The additional use of the Gini index and threshold value can result in the potential benefit of this
hybrid approach is the possibility of clustering data with variable densities. This hybrid
GINI_HDBSCAN algorithm is suitable for handling various applications where low minimum cluster
sizes are required and where there is a need to elude a huge number of small clusters which are seen
in high-density regions. In this proposed hybrid algorithm to increase the speed, parallel execution is
performed and can be executed using multiple threads. The memory overhead for this proposed
algorithm is small and the distance matrix need not be pre-computed to obtain the desired clustering
results. The proposed algorithm is experimentally tested on the educational dataset and the obtained
results show that this proposed approach is efficient for clustering huge datasets in terms of all
metrics.
Keywords
.
Copyright
Copyright © Neuroquantology
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles published in the Neuroquantology are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJECSE right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.