The health, fashion, and fitness industries are highly interested in the difficult computer vision problem of 3D reconstructing human body parts from pictures. They tackle the issue of reconstructing a human foot in this study. Accurate foot models are useful for shoe shopping, orthotics, and personal health monitoring, and the idea of recovering a 3D foot model from pictures has become highly attractive as the digital market for these businesses grows. There are four types of existing foot reconstruction solutions: Costly scanning apparatus is one method reconstruction of noisy point clouds, using depth maps or phone-based sensors like a TrueDepth camera, is another Structure from Motion (SfM) it is followed by Multi-View Stereo (MVS) and generative foot models are fitted to picture silhouettes is a fourth method.
They conclude that none of these options is adequate for precise scanning in a domestic setting: Most people cannot afford expensive scanning equipment; phone-based sensors are not widely available or user-friendly; noisy point clouds are challenging to utilize for activities that come after, such rendering and measuring; Additionally, foot generative models have been low quality and restrictive, and using only silhouettes from images limits the amount of geometrical information that can be obtained from the images, which is especially problematic in a few-view setting. SfM depends on many input views to match dense features between images, and MVS can also produce noisy point clouds.
The insufficient availability of paired pictures and 3D ground truth data for feet for training further constrains the performance of these approaches. To do this, researchers from the University of Cambridge present FOUND, or Foot Optimisation, using Uncertain Normals for Surface Deformation. This algorithm uses uncertainties in addition to per-pixel surface normals to improve upon conventional multi-view reconstruction optimization approaches. Like, their technique needs a minimal number of input RGB photographs that have been calibrated. Despite relying just on silhouettes, which are devoid of geometric information, they use surface normals and key points as supplementary clues. They also make available a sizable collection of artificially photorealistic photos matched with ground truth labels for these kinds of signals to overcome data scarcity.
Their main contributions are outlined below:
• They release SynFoot, a large-scale synthetic dataset of 50,000 photorealistic foot pictures with precise silhouettes, surface normal, and keypoint labels, to aid in research on 3D foot reconstruction. Although obtaining such information on actual photos necessitates costly scanning apparatus, their dataset exhibits great scalability. They demonstrate that their synthetic dataset captures enough variance within foot pictures for downstream tasks to generalize to real images despite only having 8 real-world foot scans. Additionally, they make available an evaluation dataset consisting of 474 photos of 14 actual feet. Each matched with high-resolution 3D scans and ground-truth per-pixel surface normals. Lastly, they make known their proprietary Python library for Blender, which allows for the effective creation of large-scale synthetic datasets.
• They show that an uncertainty-aware surface normal estimate network can generalize to actual in-wild foot pictures after training only on their synthetic data from 8 foot scans. To reduce the difference in the domain between artificial and authentic foot photos, they employ aggressive appearance and perspective augmentation. The network calculates the associated uncertainty and surface normals at each pixel. The uncertainty is helpful in two ways: first, by thresholding the uncertainty, they can obtain precise silhouettes without having to train a different network; second, by using the estimated uncertainty to weight the surface normal loss in their optimization scheme, they can increase robustness against the possibility that the predictions made in some views may not be accurate.
• They provide an optimization strategy that uses differentiable rendering to fit a generative foot model to a series of calibrated photos with expected surface normals and key points. Their pipeline outperforms state-of-the-art photogrammetry for surface reconstruction, is uncertainty-aware, and can rebuild a watertight mesh from a limited number of views. It can also be used for data obtained from a consumer’s cell phone.
Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
We are also on Telegram and WhatsApp.
Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.