


Volume 20 No 22 (2022)
Download PDF
Classifying Text and Image of Social Media Post Using Hybrid Feature Selection Technique
Sumit Jain, Dr. Hare Ram Sah
Abstract
Social media is a platform that accumulates a significant amount of user-generated data without any proper control, which poses a potential threat to individuals and communities. This research paper aims to contribute in three main areas: (1) investigating various techniques used for feature selection in text and image-based data analysis for social media, (2) conducting experiments to demonstrate the impact of different feature selection models on classifier performance for text and images, and (3) developing a novel approach to combine features from text and images for social media data classification. To achieve this, we utilized a dataset consisting of Twitter posts and text-based images from Kaggle. We first employed Optical Character Recognition (OCR) to extract text from images on social media and aligned the images with their corresponding text. We then utilized TF-IDF and chi-square tests to identify the features from the combined image and text data. The experimental results demonstrate that our proposed approach outperforms other techniques and provides an acceptable accuracy rate of up to 89%.
Keywords
Text Feature selection, Image Feature selection, machine learning algorithm, heterogeneous data, social media data.
Copyright
Copyright © Neuroquantology
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles published in the Neuroquantology are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJECSE right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.