Volume 20 No 12 (2022)
 Download PDF
Improvement of Data Classification Based on K-Value Selection Clustering Algorithm with Incomplete Data Clustering
P.S.Deshmukh, Dr. M. Sivakkumar, Dr. Varsha Namdeo
Abstract
In today’s world, most of the data is generated through computer applications. These applications can be used to predict and analyses the future. To achieve this phenomenon, we train the machines to read data and accordingly predict the future which is called as Machine Learning. Machine learning is done by training the machine to react to different data inputs. Many unsupervised learning approaches and algorithms have been introduced since the last decade where are well-known and widely used algorithms of unsupervised learning. The growing interest in applying unsupervised learning techniques forms a great success in fields such as computer vision, natural language processing, speech recognition, developing autonomous self-driving cars. Unsupervised learning eliminates the need for labelled data and manual handcrafted feature engineering enabling general, more flexible and automated ML methods. In the current age of the Fourth Industrial Revolution (4IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. K-means algorithm is one of the well-known unsupervised machine learning algorithms. The algorithm typically finds out distinct non-overlapping clusters in which each point is assigned to a group. The minimum squared distance technique distributes each point to the nearest clusters or subgroups. One of the K-means algorithm’s main concerns is to find out the initial optimal centroids of clusters. Evaluation is performed on the Iris and novel power plant fan data with induced missing values at missingness rate of 5% to 20%. We show that both miss Forest and the k nearest neighbour can successfully handle missing values and offer some possible future research direction
Keywords
Data Mining Tools, Machine Learning, Unsupervised learning, Clustering algorithms, Neural Networks, Time Complexity, big data.
Copyright
Copyright © Neuroquantology

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Articles published in the Neuroquantology are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJECSE right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.