Deep learning has become a powerful tool for classifying pathological voices, particularly in the GRBAS (Grade, Roughness, Breathiness, Asthenia, Strain) scale assessment. The GRBAS scale is a standardized method clinicians use to evaluate voice disorders based on auditory-perceptual judgment. Traditional methods for classifying pathological voices often rely on manual feature extraction and subjective analysis, which can be time-consuming and inconsistent. Deep learning techniques such as 1D convolutional neural networks (1D-CNNs) offer significant advantages by automatically learning relevant features from raw audio data, capturing complex patterns and nuances indicative of specific pathological conditions.
However, noise can significantly impact the accuracy of these models. Since they rely on extracting subtle features from voice signals, any background noise or distortion can obscure important characteristics, leading to misclassification. Noise from recording environments, equipment, or background sounds poses a critical challenge in developing reliable voice pathology detection systems. Preprocessing techniques like noise reduction and signal enhancement are often employed, but they may only sometimes be sufficient to eliminate the effects of noise on classification performance.
In this context, a new paper was recently published in the journal The Laryngoscope, which aims to assess the impact of background noise on machine learning models used for evaluating the GRBAS scale in voice disorder assessments.
In this study, the authors created a unique dataset from clinical patients’ voice samples recorded in a soundproof room. These samples were rated according to the GRBAS scale by otolaryngologists and an expert speech and language therapist. The ratings’ median values were adopted as the correct answers, and inter-rater agreement was evaluated using Krippendorff’s alpha.
The machine learning model was a 5-layer 1D-CNN, constructed and evaluated using TensorFlow. The dataset was divided into 80% training, 10% validation, and 10% test data. The training process was conducted without noise data. Gaussian noise of various intensities was added to the test samples to assess noise resilience. The model’s performance was evaluated using accuracy, F1 score, and quadratic weighted Cohen’s kappa score under different noise conditions. The study highlights the significance of noise as a challenge in applying machine learning models to real-world scenarios like examination rooms.
The dataset of voice samples, balanced for age and gender, showed that the deep learning model performed well with noise-free data. As Gaussian noise intensity increased, performance metrics dropped significantly, with accuracy falling dramatically at the highest noise level. This degradation was observed across all GRBAS parameters, with certain scales showing the most significant declines.
The study found that background noise severely affects the model’s accuracy and performance metrics. The model’s effectiveness decreased as noise levels increased, highlighting its vulnerability to real-world conditions. Certain GRBAS components were more sensitive to noise. The study suggests incorporating noise-resilient techniques such as data augmentation and noise reduction to improve model robustness. Limitations include the small number of evaluators and using only one type of vocal sample, which may not fully capture the variability in voice disorders. Future work should address these issues to enhance the model’s generalizability and performance in noisy environments.
To conclude, the model’s performance significantly declined with increased background noise, impacting the evaluation metrics. Future research should focus on developing noise-tolerant methods, such as data augmentation, to enhance the model’s resilience in real-world conditions. Improving the GRBAS scale’s reliability can make it a valuable tool for both physicians and patients. Automated evaluations can facilitate earlier disease detection, leading to more effective treatments and better support for rehabilitation.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 47k+ ML SubReddit
Find Upcoming AI Webinars here
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor’s degree in physical science and a master’s degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep
networks.