The website content discusses advancements in machine learning for creating photorealistic 3D human reconstructions from monocular RGB images, highlighting the state-of-the-art method known as PHORHUM.
Abstract
The provided web content delves into the cutting-edge techniques for generating 3D scans of clothed humans, emphasizing the significance of such technology in enhancing applications like AR/VR, gaming, and virtual try-ons. It introduces PHORHUM, an end-to-end trainable deep neural network approach that reconstructs photorealistic 3D human models from single 2D images. This method not only predicts the geometry and surface color but also accounts for scene lighting and shading. The article underscores the limitations of current datasets and suggests that a more diverse and comprehensive dataset could improve the model's performance, especially in reconstructing various clothing styles and body types. The authors of the PHORHUM technique have demonstrated its effectiveness in capturing detailed textures and shapes, although it currently struggles with non-Western or loose-fitting attire due to dataset limitations.
Opinions
The authors believe that their method, PHORHUM, is superior to previous techniques due to its ability to predict albedo surface color and shading, providing a more photorealistic reconstruction.
The authors acknowledge the need for a more diverse dataset to improve the model's performance across different clothing styles, body types, and cultural backgrounds.
There is an emphasis on the practical applications of accurate 3D human models in various industries, indicating a strong demand for such technology.
The article suggests that rendering losses are essential for reconstructing perceptually correct surface color, in addition to sparse 3D supervision for geometry.
The authors encourage readers to engage with the full essay and explore future ideas for 3D scans of humans, indicating a forward-thinking approach and an invitation for collaboration within the field.
There are several applications for 3D scans of humans wearing clothes, which is increasing demand. For example, accurate 3D people models would enhance immersive AR and VR applications, gaming, telepresence, virtual try-on, free-viewpoint photorealistic visualization, and creative picture editing.
Automatic scanning using multi-camera setups, manual production by an artist, or a combination of both is the traditional method for obtaining human models. Artists are frequently used to ‘clean up’ scanning errors. Because such tactics are difficult to scale, designers are looking for other automated alternatives that are less expensive and easier to implement.
April 2022 — AI art tools update can be found ➡️ HERE ⬅️
State-of-the-art method, PHORHUM. End-to-end trainable deep neural network approach for photorealistic 3D human reconstruction from a monocular RGB picture.
The Pixel-aligned algorithm calculates the whole 3D geometry and the unshaded surface color and scene lighting for the first time. Furthermore, the authors present patch-based rendering losses to enable trustworthy color reconstruction on visible areas of the human and comprehensive and convincing color estimates for non-visible regions after observing that 3D supervision alone is insufficient for high-quality color reconstruction. Furthermore, their technique mainly tackles previous work’s methodological and practical constraints in modeling geometry, albedo, and lighting effects in an end-to-end model where components may be successfully disentangled.
The attributes of single-picture 3D human reconstruction approaches are summarized( the below image) . Only the PHORHUM technique can predict albedo surface color and shading. Additionally, the process offers the most practicable training setup, does not require picture matting at test time, and delivers signed distances rather than binary occupancy, which is a more helpful representation.
From an input picture I, the feature extractor network G generates pixel-aligned features zx for all locations in space x. Given a place and its feature, the implicit signed distance function network f computes the distance d to the closest surface. (the below image)
Furthermore, f returns albedo colors for surface locations. Given the surface normal nx and light l, the shading network s predicts the shading for surface points. On the right, the authors display the shaded 3D geometry and the reconstruction of geometry and albedo colors.
The approach is the first to compute the 3D geometry, surface albedo, and shading from a single picture as part of an end-to-end model prediction. PHORHUM works well with a broad range of clothes and different body types and skin tones, and reconstructions capture the majority of the detail in the input image. However, while sparse 3D supervision works well for restricting geometry, rendering losses are required to reconstruct perceptually correct surface color, according to the authors.
Limitation of the dataset
When the attire or position of the person in the input image deviates too much from our dataset distribution, the limits of their technique become obvious. Their training gear does not adequately cover loose, large, or non-Western apparel. In addition, the rear of a person does not always match the front side in terms of semantics. These issues might be solved with a bigger, more geographically and culturally varied sample. I strongly advise you to read the complete essay below. The link also provides some fascinating future ideas for 3D scans of humans.
Title:Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothingthe Authors :Thiemo Alldieck, Mihai Zanfir, Cristian Sminchisescu
Google Research
Data Scientists must think like an artist when finding a solution when creating a piece of code. Artists enjoy working on interesting problems, even if there is no obvious answer.
All our writers (members) receive the opportunity to be promoted on our social media, which increases the popularity of articles published on MLearning.ai