


Volume 20 No 15 (2022)
Download PDF
Big data analysis with parallel and distributed computing in cloud environment
Mahboob Alam , Pradeep Kumar Harsha K.G., Vinooth P, Mukesh Raj
Abstract
Analysis with large statistics or huge records analysis has come to be energetic research vicinity. It is very
difficult the use of cutting-edge methodologies and statistics analysis software program equipment for one
individual PC to handle extremely huge datasets with efficiency. The cloud based computing and parallel
computing structures are taken to be a higher answer to perform massive facts analysis. The parallel
computing concept is entirely reliant on dividing a huge trouble in smaller parts, every one of those is
completed by using an individual standalone processor in my opinion. Further, those procedures are done
simultaneously in an allotted but parallel way. These are some of the conventional strategies to deal with
large records problems. The median one is the allotted procedure primarily reliant upon the paradigm of
data parallelism, where a large dataset under consideration is broken down into n subsets by employing
manual methods, and an equal number of algorithms are run for the respective n subsets. The eventual
result may be received from an aggregation of outputs generated by the n algorithms. Another approach is
a procedure based on map reduce, which runs beneath the platform of cloud computation. This process is
basically the map & reduce methods, where the first one is responsible for filtering and sorting while the
second one does an operation of summary to generate the final result. In this paper, we attempt to examine
the overall difference of performances between the mapreduce and the dispensed techniques over huge
datasets with respect to the analysis efficiency and accuracy.The experimentation is primarily dependent
upon the 4 big size datasets that are used for the records classification troubles. The results exhibit that the
performance of the mapreduce based technique in terms of classification is rather robust irrespective of
the number of pc nodes used, and is preferable over the baseline unmarried machine and distributed
approaches besides magnificent imbalanced dataset. Similarly, Mapreduce method calls for the minimal
computing value to process those huge dataset
Keywords
Big Data, Parallel Computing, Distributed Computing, Cloud, MapReduce
Copyright
Copyright © Neuroquantology
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles published in the Neuroquantology are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJECSE right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.