Researchers from FNii CUHKSZ, SSE CUHKSZ introduce MVHumanNet, a vast dataset for multi-view human action sequences with extensive annotations, including human masks, camera parameters, 2D and 3D key points, SMPL/SMPLX parameters, and textual descriptions. MVHumanNet facilitates exploration in action recognition, human NeRF reconstruction, text-driven view-unconstrained human image generation, and 2D/3D avatar creation, aiming to drive innovation in large-scale 3D human-centric tasks.
Overcoming limitations of existing datasets, MVHumanNet includes human masks, camera parameters, 2D/3D key points, SMPL/SMPLX parameters, and textual descriptions. The dataset supports research in 2D/3D human-centric tasks like action recognition, NeRF reconstruction, text-driven view-unconstrained human image generation, and 2D/3D avatar creation. MVHumanNet’s release is anticipated to drive innovations in large-scale 3D human-centric tasks.
Acknowledging the role of large-scale datasets in advancing AI, especially in language and text-to-image models, the study notes the disparity in progress within human-centric tasks due to the absence of extensive human datasets. Existing 3D human datasets need more diversity in identities and clothing. To address this, MVHumanNet is introduced and aims to drive innovations in 2D/3D visual tasks related to human-centric activities on a large scale.
Captured through a scalable multi-view human system, the dataset serves various 2D and 3D visual tasks, including action recognition, NeRF-based human reconstruction, text-driven image generation, and avatar creation. The researchers employed generative models like StyleGAN2 and GET3D for 2D and 3D human image synthesis, leveraging the dataset’s scale. MVHumanNet enables research and innovations in diverse human-centric tasks at a large scale.
MVHumanNet is a substantial dataset capturing multi-view human sequences with 4,500 identities, 9,000 outfits, and extensive annotations. Pilot studies using MVHumanNet show performance gains and effectiveness in diverse 2D and 3D visual tasks, including action recognition, NeRF-based reconstruction, text-driven image generation, and avatar creation. The dataset’s large-scale, real-captured multi-view data enhances the efficacy of text-driven realistic human image generation, fostering diverse and comprehensive human image synthesis.
In conclusion, MVHumanNet is a valuable resource for researchers and developers working on various visual tasks related to human-centric applications. With its comprehensive multi-view captures, extensive annotations, and large-scale real-captured data, it is expected to drive further innovations like action recognition, human NeRF reconstruction, text-driven image generation, and avatar creation. The dataset’s contribution to diverse image synthesis, with pose variations, enhances the effectiveness of realistic human image generation, which makes it an essential tool for large-scale 3D human-centric tasks.
Future research recommends publicly releasing the MVHumanNet dataset with annotations to provide a foundational resource for future research in the 3D digital human community. The researchers intend to incorporate all data to explore opportunities to scale training datasets. To address potential negative social impacts, they plan to implement strict regulations governing the use of the data.
Check out the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
Hello, My name is Adnan Hassan. I am a consulting intern at Marktechpost and soon to be a management trainee at American Express. I am currently pursuing a dual degree at the Indian Institute of Technology, Kharagpur. I am passionate about technology and want to create new products that make a difference.