Computer Vision is one of the most significant subfields of Artificial Intelligence. With the exponential boom in the field of AI, Computer vision is also advancing with the power of its amazing capabilities. One of the most important tasks in computer vision is semantic segmentation, which entails assigning an appropriate item or region class to each pixel in an image. Numerous industries, including autonomous driving, retail, face recognition, and others, use this method.
Semantic segmentation algorithms have traditionally depended on supervised learning, which requires a sizable amount of labeled data for training. However, acquiring and annotating such big datasets can be a time- and resource-consuming effort. Also, training neural networks for semantic segmentation has been costly due to the need for human-made annotations, where each pixel in an image is labeled with the corresponding object or region class.
Unsupervised learning has made significant strides recently, tackling this problem and approaching the performance levels of supervised methods. The main goal of unsupervised semantic segmentation is to extract semantic information from a dataset by identifying correlations between randomly selected image feature values. In recent research, a team of researchers from Ulm University and TU Vienna has taken these advancements a step further by introducing information about the scene’s structure into the training process using depth information.
Called DepthG, this approach has been introduced with the aim of integrating spatial information, specifically depth maps, into the STEGO training process, which is a notable model that uses a Vision Transformer (ViT) to extract features from images, followed by a contrastive learning approach to distilling these features across the dataset. Since STEGO operates solely in the pixel space, ignoring the scene’s spatial layout, this new development integrates depth maps into STEGO’s training process.
The research includes two primary contributions, which are as follows –
- Learning Depth-Feature Correlations: It focuses on teaching depth information and visual feature correlations, which is accomplished by spatially connecting the depth maps and feature maps that were taken from the images. The neural network learns more about the scene’s fundamental arrangement as a result. It basically learns how things are arranged in relation to one another in three dimensions.
- Efficient Feature Selection with 3D Sampling – It focuses on enhancing the selection of pertinent characteristics for segmentation. This has been done using a method known as Farthest-Point Sampling. This method makes use of 3D sampling methods on the scene’s depth data. It chooses characteristics that are scattered in 3D space in a way that makes the scene’s structure clearer.
The team has shared that DepthG is distinct as it integrates 3D scene knowledge into unsupervised learning for 2D photos without requiring depth maps as part of the network input. With this method, there is no chance that the model will rely on depth information during inference when it might not be available. DepthG does not rely on depth information when it makes predictions on fresh, unlabeled photos.
In conclusion, this study builds on recent developments in unsupervised learning to solve the issue of costly human-made annotations in semantic segmentation. The model improves its comprehension of the scene’s structure by including depth information in the training process and learning depth-feature correlations. The use of 3D sampling techniques also improves the selection of pertinent features. Together, these developments result in considerable performance gains on a range of benchmark datasets, demonstrating the method’s potential to advance computer vision research.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.