Volume 20 No 8 (2022)
Download PDF
Data Splitting Techniques to Reduce False-Positive and FalseNegative Cases in Breast Cancer Prediction
Vijay Birchha ,Bhawna Nigam
Abstract
The massive worldwide number of women affected with breast cancer; is the most common and
severe cause of women’s high mortality rate. The false diagnosis can be considered the most
significant cause of the late discovery of breast cancer. The chances of curing breast cancer increase
if the number of false-positive and false-negative predictions is reduced. The research objectivesare;
can dataset splitting techniques used to train the machine learning classifiers affect the classifier
performance?; do they help to minimize false-positive and false-negative predictions of breast
cancer? In this work, artificial neural network (NN), support vector machine (SVM), logistic
regression (LR) and decision forest (DF) machine learning (ML) classifiers were used with The breast
cancer Wisconsin (original) dataset (WBC). The classifier’s false-positive and false-negative
predictions were compared with different dataset splitting techniques train-test (TT), train-testvalidation (TTV) and k-fold cross-validation. The neural network classifier scored zero FP predictions
with the train-test-validation dataset splitting method. The support vector machine recorded zero
FN predictions with the k-fold cross-validation dataset splitting method. The results proved that the
selection of dataset splitting techniques significantly impacts machine learning
classifierperformance. The result will help implement a computer-aided system to diagnose breast
cancer more accurately.
Keywords
Breast cancer, Wisconsin dataset, machine learning,false-positive, false-negative,support vector machine, decision forest, neural network, logistic regression,dataset split,
Copyright
Copyright © Neuroquantology
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles published in the Neuroquantology are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJECSE right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.