Volume 21 No 6 (2023)
 Download PDF
An Efficient Speech Enhancement Approach for Punjabi Language Using Acoustic and Tonal Features
JASPREET KAUR SANDHU, AMITOJ SINGH, MUNISH KUMAR
Abstract
In this article, we present our implementation of the Punjabi Speech Enhancement System (PSES) along with the Punjabi Speech Presence Estimator (PSPE) algorithm to improve the performance of the Automatic Speech Recognition System (ASR) for Punjabi language. We used Ideal Ratio Mask (IRM) and Log Power Spectrum (LPS) features as the training targets in combination with PSPE. These training targets were used to train deep learning models such as Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (BLSTM). Proposed study included feature extraction of both acoustic and tonal features. Acoustic features, including Mel-Frequency Cepstral Coefficients (MFCC), Linear Prediction Coefficients (LPC), Gammatone Frequency Cepstral Coefficients (GFCC), and Bark Frequency Cepstral Coefficients (BFCC) were used, along with tonal features, such as Pitch, Loudness, and Intensity specific to Punjabi. We calculated the Word Error Rate (WER) for four types of background noises: Babble, Factory, Street, and White, using LSTM and BLSTM on individual acoustic features with tonal features, as well as various combinations of hybridization of acoustic features with tonal features. For all the experiments, LPS-IRM was used as the training target, and we found that MFCC as an individual feature performed the best for LSTM and BLSTM. The WER achieved using LPS-IRM-MFCC+tonal features with LSTM and BLSTM was 25.37% and 24.85% respectively. The hybridization of MFCC+BFCC+tonal features with LSTM and BLSTM resulted in WER of 24.47% and 23.82% respectively. Among the hybridization of three acoustic features, the MFCC+GFCC+BFCC combination performed the best. The use of LSTM with MFCC+GFCC+BLSTM+tonal features achieved a WER of 23.83%, while BLSTM achieved a lower WER of 19.81%.
Keywords
Punjabi Automatic Speech Recognition System, Speech Enhancement System, Deep Neural Network , Long Short- Term Memory, Bidirectional Long Short Term Memory,Ideal Ratio Mask, Log Perceptual Spectra.
Copyright
Copyright © Neuroquantology

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Articles published in the Neuroquantology are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJECSE right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.