Volume 20 No 12 (2022)
 Download PDF
An Approach For Mining Large Dataset Using Clustering Algorithm
S Archana, Dr.Neeraj Sharma , Dr. Pradosh chandra patnaik
Abstract
Clustering is an unsupervised machine learning technique for discovering and grouping related data points in big datasets without regard for the end result. Clustering is advantageous in data mining because it enables the discovery of groups and the identification of relevant distributions within the underlying data. Traditionally used clustering techniques either prefer spherical clusters with similar sizes or are extremely brittle in the presence of outliers. Utilizes a combination of random sampling and partitioning to manage huge databases. After partitioning a random sample of the data set, each partition is somewhat clustered. After that, the partial clusters are clustered again to get the desired clusters. Numerous parallel methods based on the MapReduce architecture have been presented recently to address the scalability issue caused by increasing data sizes. When huge data is clustered in parallel using the KMeans algorithm, it is read repeatedly during each iterative step, considerably increasing both I/O and network costs. We offer a new Collection-based KMeans clustering technique, dubbed CBKMeans, in this study that effectively reduces data size while improving clustering accuracy through representative verification. Our experimental results demonstrate that CBKMeans are more efficient, scalable, and accurate than k-means
Keywords
Data mining, knowledge discovery, clustering algorithms, sampling
Copyright
Copyright © Neuroquantology

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Articles published in the Neuroquantology are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJECSE right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.