Data augmentation is a critical technique in deep learning that involves creating new training data by modifying existing samples. It is essential because it diversifies the training data, improving the model’s ability to generalize to new, unseen examples. Creating variations of existing samples prevents overfitting and helps the model learn more robust and adaptable features, which is crucial for accurate predictions in real-world scenarios.
One popular method is single-image-based data augmentation, where sections of an image are randomly erased or altered in various ways. Cutting-edge data augmentation techniques encompass dropout methods like adaptive dropout and spatial dropout, aiming to curb overfitting. Single-image-based approaches such as CutOut, Random Erasing (RE), Hide and Seek (HS), and GridMask modify individual images for increased robustness, potentially losing key features. Multi-image-based methods like MixUp, CutMix, RICAP, and IMEDA blend multiple images to diversify datasets and enhance model performance.
In this context, a new technique called Random Slices Mixing Data Augmentation (RSMDA) has been proposed by researchers from Dublin City University, UCD, and the University of Galway. RSMDA aims to overcome the challenges of single-image-based augmentation techniques by mixing image slices in different ways: vertically, horizontally, or a combination of both. RSMDA involves combining slices of one image with another to generate a third image, thereby diversifying the training dataset. In addition, this method alters the labels of the original images to create augmented labels for the new images, enhancing the training process through label smoothing.
Concretely, RSMDA follows five steps:
- Selecting Training Samples: Two images and their corresponding labels are chosen.
- Blending Images: RSMDA combines parts of these images to create a new image. It uses a binary mask to select and merge sections from each image.
- Adjusting Labels: The labels of the combined images are also adjusted based on a chosen ratio, ensuring the labels align with the blended image.
- Slicing and Mixing: Parts of the images are randomly selected and mixed to form the combined image. RSMDA offers three strategies for this mixing process: row-wise, column-wise, or a combination of both.
- Creating Augmented Samples: Selected portions from one image are pasted onto another image according to the chosen mixing strategy. This process generates new image-label pairs used for training.
RSMDA was subjected to thorough evaluations across diverse datasets and network architectures. Throughout the experiments, RSMDA explored various strategies, including RSMDA(R), which denotes Random Slices Mixing Row-wise. This specific strategy, RSMDA(R), consistently performed better in reducing error rates compared to baseline models and existing augmentation techniques. Moreover, RSMDA showcased remarkable robustness against adversarial attacks across grayscale and color datasets, outperforming traditional augmentation methods. Visualizations of Class Activation Maps affirmed RSMDA’s effectiveness in learning discriminative features akin to advanced augmentation techniques like CutMix. These experiments collectively highlight RSMDA’s prowess in enhancing model performance, robustness, and feature learning within deep learning applications.
In this paper, a new data augmentation technique, Random Slices Mixing Data Augmentation (RSMDA), was introduced and rigorously evaluated. RSMDA creatively blends sections of images to generate diverse training samples, addressing the limitations of single-image-based methods. The strategy RSMDA(R), focusing on row-wise mixing, consistently outperformed existing techniques in reducing error rates and showcased robustness against adversarial attacks across diverse datasets. RSMDA’s capability to learn discriminative features was affirmed through Class Activation Maps, paralleling advanced augmentation methods like CutMix. Overall, RSMDA emerges as a promising augmentation technique, exhibiting prowess in enhancing model performance, robustness, and feature learning in deep learning applications.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor’s degree in physical science and a master’s degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep
networks.