Person re-identification (ReID) aims to identify individuals across multiple non-overlapping cameras. The challenge of obtaining comprehensive datasets has driven the need for data augmentation, with generative adversarial networks (GANs) emerging as a promising solution.
Techniques like GAN and its variant, deep convolutional generative adversarial networks (DCGAN), have been used to generate human images for data augmentation. The Camera style (CamStyle) using CycleGAN addresses the issue of varying camera styles, while the pose-normalized GAN (PNGAN) focuses on capturing different pedestrian postures. The primary challenge is matching persons across varying camera styles. GAN-based methods often produce unlabeled images, and while some techniques reduce camera style differences, they can introduce noise and redundancy. The diversity in pedestrian postures across cameras also presents a challenge.
A research team from China published a new paper to overcome the challenges cited above. The authors introduced an improved CycleGAN for ReID data augmentation. Their method integrates a pose constraint sub-network, ensuring consistency in posture while learning camera style and identity. They also employ the Multi-pseudo regularized label (MpRL) for semi-supervised learning, allowing for dynamic label weight assignment. Preliminary results indicate superior performance on multiple ReID datasets.
The complete system comprises two generator networks, two discriminator networks, and two semantic segmentation networks. These segmentation networks are termed pose constraint networks and are instrumental in ensuring consistency in pedestrian postures across different images. In the improved CycleGAN, first, a generator is tasked with creating fake images, and the discriminator assesses the authenticity of these pictures. Through a continuous iterative process, the generated images are progressively refined to resemble real images closely. A significant feature of this approach is the pose constraint loss, which ensures the posture of one domain (X) aligns with the other domain (Y). This loss is computed by measuring the pixel distance between the fake and real images.
Additionally, the CycleGAN utilizes cyclic consistency to map generated images back to their source domain, ensuring the integrity of transformations. To improve the performance of the improved CycleGAN, a training strategy has been outlined. This strategy involves using image annotation tools, pre-training specific sub-networks, and continuously optimizing the total loss function.
Lastly, the paper introduces the Multi-pseudo regularized label (MpRL) method, designed to assign labels to generated images more effectively than traditional semi-supervised learning techniques. The MpRL offers varying weights to different training classes, allowing for more refined and accurate labeling of generated images and improving pedestrian re-identification results. This method contrasts with the LSRO strategy, which tends to provide uniform weights to all training classes, often resulting in less accurate predictions.
To evaluate the efficiency of the proposed method, the authors tested on three-person re-identification (ReID) datasets: Market-1501, DukeMTMC-reID, and CUHK03-NP. These datasets confront challenges like color differences between cameras and data imbalance. Rank-n and mAP were the primary evaluation metrics used. The experiment was built in Python3 with PyTorch on a robust Linux server. Initially, an improved CycleGAN network was trained for camera discrepancies, followed by the ReID network. For validation, the authors conducted an ablation study. The improved CycleGAN yielded better rank-1 and mAP scores than the standard CycleGAN. The best hyperparameters for the CycleGAN were determined experimentally. Comparisons between the LSRO and MpRL methods revealed that MpRL was superior. Incorporating various popular loss functions with MpRL had varying effects on performance. The results established that using the improved CycleGAN with the MpRL method outperformed conventional data augmentation techniques, effectively bridging camera style differences and enhancing re-identification accuracy. Comparing the proposed method against other state-of-the-art methods further corroborated the superiority of their approach.
To conclude, the research team introduced an advanced CycleGAN for person re-identification, embedding a pose constraint sub-network to diminish camera style variances. Pose constraint losses maintain posture consistency during identity learning. MpRL is used for label allocation, enhancing re-identification precision. Evaluations on three ReID datasets confirm their method’s efficacy. Future efforts will focus on domain variances to optimize the model for real-world scenarios.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor’s degree in physical science and a master’s degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep
networks.