Research focuses on categorizing human facial images by emotions through facial expression recognition (FER) using powerful deep neural networks (DNNs). However, accurately classifying unlearned input, particularly non-face images, remains challenging. Open-set recognition (OSR) in FER addresses this by distinguishing between facial and non-face images, which is vital for enhancing FER accuracy.
Existing methods for OSR in FER face challenges in distinguishing facial images, including class-ambiguous ones, from non-face images. Some methods rely on classification outputs but struggle with class-ambiguous images, while others use image reconstruction, which is complex for facial images.
In this context, a recent article published by a Japanese research team proposes a new method that utilizes a modified projection discriminator within a class-conditional generative adversarial network (GAN) to address this challenge effectively.
Concretely, the innovation assumes that facial images align with distinct emotions while non-face images do not. This intuition forms the basis for training a discriminator to determine whether the input aligns with any emotion, enabling effective classification. In addition, the method introduces OSR metrics that eliminate classes from class-conditioned probabilities, facilitating the handling of complex facial images. A modified projection discriminator, integrated into a class-conditional GAN, is key to achieving discrimination.
Initially, a DNN-based facial expression classifier is trained using a dataset comprising only facial images. This classifier predicts emotion-class labels for given input images. Subsequently, datasets containing facial and non-face images are prepared for OSR. OSR metrics, hface(•) and hnon-face(•), are introduced to determine if an input image belongs to the facial image or non-face image category based on probability distributions associated with image categories. The method involves a feature extractor, class discriminator, and match-or-not discriminator. Features are extracted using the feature extractor, emotion-class labels are computed using the class discriminator, and image match determination is performed using the match-or-not discriminator.
The training process includes training the feature extractor and class discriminator as a facial expression classifier for complex image handling. The match-or-not discriminator is also trained to obtain an OSR metric for effectively handling class-ambiguous images. The learning process involves minimizing prediction errors through a suitable loss function. The match-or-not discriminator is trained for binary classification using a counterfactual dataset. Finally, OSR metrics are computed using empirical and marginal methods, allowing accurate distinction between facial and non-face images, even in challenging scenarios.
The authors evaluated the proposed method’s effectiveness in OSR for FER in the experiments. They conducted experiments in two settings: comparing RAF-DB vs. Stanford Dogs and facial images vs. non-face images in AffectNet. The evaluation was based on the area under the receiver operating characteristic (AUROC) curve, a standard measure for OSR performance. Comparative analysis involving five methods and the proposed approach demonstrated its superior performance in effectively handling complex and class-ambiguous images.
The authors explored various class-conditioning methods within the proposed approach in an additional ablation study. Three approaches were compared, and the results showed that the projection discriminator outperformed the others, indicating its suitability for the proposed method and ability to enhance OSR performance in FER.
In conclusion, the study introduces an innovative approach using a modified projection discriminator in a class-conditional GAN to address Open-Set Recognition in Facial-Expression Recognition. By leveraging the distinctive nature of facial expressions, the method effectively discriminates between facial and non-face images. The experiments demonstrate its superior performance over existing methods, emphasizing its potential to enhance FER accuracy.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor’s degree in physical science and a master’s degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep
networks.