Dynamic view synthesis is the process of reconstructing dynamic 3D scenes from captured videos and creating immersive virtual playback. This process has been a long-standing research problem in computer vision and graphics, a process that holds significant promise in the field of VR/AR, sports broadcasting, and artistic performance capturing.
Traditional methods for representing dynamic 3D scenes use textured mesh sequences, but these methods are complex and computationally expensive, making them impractical for real-time applications.
In recent times, some methods have produced great results when it comes to dynamic view synthesis, showing impressive rendering quality. However, one area they still need to improve in is latency while rendering high-quality images. This research paper introduces 4K4D, a 4D point cloud representation that supports hardware rasterization and allows quick rendering.
4K4D represents 3D scenes based on a 4D grid of features, i.e., as a vector of 4 features. Such a representation makes the points in the grid regular and easier to optimize. The model first represents objects’ geometry and shape in the input video using a space carving algorithm and a neural network to learn how to represent the 3D scene from the point cloud. A differential depth peeling algorithm is then developed for rendering the point cloud representation, and a hardware rasterizer is leveraged to improve the rendering speed.
To boost the rendering speed, the following acceleration techniques are applied:
- Some model parameters are precomputed and stored in memory, allowing the graphics card to render the scene faster.
- The precision of the model is reduced from 32-bit float to 16-bit float. This increases the FPS by 20 without any visible performance loss.
- Lastly, the number of rendering passes required for the depth peeling algorithm is reduced, which also increases the FPS by 20 with no visible change in quality.
The researchers evaluated the performance of 4K4D on multiple datasets such as DNA-Rendering, ENeRF-Outdoor, etc. The researcher’s method for rendering 3D scenes can be rendered at over 400 FPS at 1080p on the former dataset and at 80 FPS at 4K on the latter. This is 30 times faster than the state-of-the-art real-time dynamic view synthesis method ENeRF, that too with superior rendering quality. The ENeRF Outdoor dataset is a rather challenging one with multiple actors. 4K4D was still able to produce better results as compared to the other models, which produced blurry results and exhibited black artifacts around the image edges in some of the renderings.
In conclusion, 4K4D is a new method that aims to tackle the issue of slow rendering speed when it comes to real-time view synthesis of dynamic 3D scenes at 4K resolution. It is a neural point cloud-based representation that achieves state-of-the-art rendering quality and exhibits a more than 30× increase in rendering speed. However, there are a couple of limitations, such as high storage requirements for long videos and establishing point correspondences across frames, which the researchers plan to address in future work.
Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
We are also on WhatsApp. Join our AI Channel on Whatsapp..