Volume 20 No 15 (2022)
 Download PDF
Big data analysis with parallel and distributed computing in cloud environment
Mahboob Alam , Pradeep Kumar Harsha K.G., Vinooth P, Mukesh Raj
Abstract
Analysis with large statistics or huge records analysis has come to be energetic research vicinity. It is very difficult the use of cutting-edge methodologies and statistics analysis software program equipment for one individual PC to handle extremely huge datasets with efficiency. The cloud based computing and parallel computing structures are taken to be a higher answer to perform massive facts analysis. The parallel computing concept is entirely reliant on dividing a huge trouble in smaller parts, every one of those is completed by using an individual standalone processor in my opinion. Further, those procedures are done simultaneously in an allotted but parallel way. These are some of the conventional strategies to deal with large records problems. The median one is the allotted procedure primarily reliant upon the paradigm of data parallelism, where a large dataset under consideration is broken down into n subsets by employing manual methods, and an equal number of algorithms are run for the respective n subsets. The eventual result may be received from an aggregation of outputs generated by the n algorithms. Another approach is a procedure based on map reduce, which runs beneath the platform of cloud computation. This process is basically the map & reduce methods, where the first one is responsible for filtering and sorting while the second one does an operation of summary to generate the final result. In this paper, we attempt to examine the overall difference of performances between the mapreduce and the dispensed techniques over huge datasets with respect to the analysis efficiency and accuracy.The experimentation is primarily dependent upon the 4 big size datasets that are used for the records classification troubles. The results exhibit that the performance of the mapreduce based technique in terms of classification is rather robust irrespective of the number of pc nodes used, and is preferable over the baseline unmarried machine and distributed approaches besides magnificent imbalanced dataset. Similarly, Mapreduce method calls for the minimal computing value to process those huge dataset
Keywords
Big Data, Parallel Computing, Distributed Computing, Cloud, MapReduce
Copyright
Copyright © Neuroquantology

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Articles published in the Neuroquantology are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJECSE right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.