Volume 20 No 8 (2022)
Download PDF
Deep Learning Model for Human Emotion Detection Model using Speech Recognition and Facial Expression
Appala Sravan Kumar, Vuppula Manohar, M. Shashidhar
Abstract
Purpose: This study presents a multimodal emotion recognition framework that integrates facial and vocal cues through deep learning. By addressing the complexities of human-computer interaction (HCI), the research develops an end-to-end desktop application for real-time affective state classification.
Methodology: The system employs two specialized Convolutional Neural Networks (CNNs). The visual pipeline utilizes 32×32 normalized facial images, while the acoustic pipeline processes a feature fusion of Mel-frequency cepstral coefficients (MFCCs), chroma, and mel-spectrograms. A unified Tkinter-based GUI facilitates the entire lifecycle from dataset preprocessing and model training to performance visualization through automated accuracy/loss plotting. The architecture utilizes a two-stage convolution-pooling backbone with a 256-unit dense layer, optimized for seven facial and eight vocal emotion classes.
Findings: Empirical results demonstrate high classification reliability, with both models achieving over 96% training accuracy. The modular design ensures efficient persistence of weights and architectures, allowing for low-latency inference in mental health monitoring and adaptive learning environments.
Originality: The integration of dual-modality CNNs into a lightweight, accessible desktop environment bridges the gap between high-complexity research models and practical, user-centric diagnostic tools.
Keywords
Multimodal Emotion Recognition, CNN, MFCC, Tkinter GUI, Affective Computing, Human-Computer Interaction.
Copyright
Copyright © Neuroquantology
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles published in the Neuroquantology are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJECSE right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.