Volume 20 No 8 (2022)
 Download PDF
A NOVEL PERFORMANCE ENHANCED AND OUTLIER RESISTANT HYBRIDIZED GINI_HDBSCAN DEEP CLUSTERING ALGORITHM FOR BIG DATA ANALYSIS
N.Valarmathy. , Dr.Krishnaveni Sakkarapan
Abstract
HDBSCAN is a unique and most prominent density-based clustering algorithm in which is it possible to construct hierarchy trees and extract flat clusters from that tree using specific stability measures. Predominantly most of the hierarchical clustering algorithms used nowadays have a huge number of computations in obtaining pairwise dissimilarity measures. Such limitations can be overcome using a clustering algorithm that makes use of a single linkage concept and faces many problems like it is very much prone to outliers and can produce extremely skewed or slanted dendrograms. To overcome the limitations a hierarchical clustering linkage criterion commonly known as Genie is being used which can link two clusters with a chosen inequity measure (Gini Index or Bonferroni Index) so that the size of the cluster will not go more than the assigned threshold value. The additional use of the Gini index and threshold value can result in the potential benefit of this hybrid approach is the possibility of clustering data with variable densities. This hybrid GINI_HDBSCAN algorithm is suitable for handling various applications where low minimum cluster sizes are required and where there is a need to elude a huge number of small clusters which are seen in high-density regions. In this proposed hybrid algorithm to increase the speed, parallel execution is performed and can be executed using multiple threads. The memory overhead for this proposed algorithm is small and the distance matrix need not be pre-computed to obtain the desired clustering results. The proposed algorithm is experimentally tested on the educational dataset and the obtained results show that this proposed approach is efficient for clustering huge datasets in terms of all metrics.
Keywords
.
Copyright
Copyright © Neuroquantology

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Articles published in the Neuroquantology are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJECSE right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.