avatarDariusz Gross #DATAsculptor

Summary

Researchers have developed a new text-guided diffusion model for photorealistic 3D object generation and editing, which significantly improves 3D consistency and enables local editing and one-shot novel view synthesis.

Abstract

The article discusses a groundbreaking method for 3D generation and editing using text-guided diffusion models. This approach ensures 3D consistency by combining neural fields with a two-stream asynchronous diffusion process. It also introduces a novel technique for 3D local editing and extends the model for one-shot novel view synthesis. The method represents a significant advancement in the field of 3D modeling, with applications in gaming, entertainment, architecture, and robotics simulation. It demonstrates the potential of AI in creating detailed and controlled 3D objects from textual descriptions, marking a transition from traditional 3D modeling to AI-driven design.

Opinions

  • The researchers believe that their proposed NeRF-based Condition Module and Two-stream Asynchronous Diffusion Module, along with new diffusion training and sampling strategies, make their 3DDesigner model superior to existing methods.
  • The article suggests that the transition from 2D to 3D modeling is complex, but the new method simplifies the process by allowing the generation of realistic 3D models from text.
  • The author invites readers to explore the concept of Machine Learning Art, indicating enthusiasm for the intersection of AI and creativity.
  • The researchers have not released the code for their method, but they encourage the use of similar tools available for free.
  • The author expresses a passion for AI, as seen in the invitation to join Medium and follow their work, as well as the promotion of their Instagram and LinkedIn profiles for further collaboration and content.

New 3D generation method — 2×faster

From 3DS max to 3D AI Designer

Diffusion models for 3D generation and inpainting

Generate realistic 3D models from text

What kind of work is done in 3D modeling?

Artists create 3D models. They work in film and video production studios, game design, graphic and advertising, web design, software, architecture, product design, or manufacturing. 3D AI Designer is the first text-guided generative model to conduct 3D generation, 3D editing / inpainting , and one-shot view synthesis.

Next step in text-to-3D

3D inpainting

Text-guided diffusion models are better at making and editing images and videos. Even though there have been few 3D explorations. In this work, the researchers talk about three important and interesting issues on this subject.

First, they use text-guided diffusion models to ensure that the generation is consistent in 3D. In particular, they combine a neural field like NeRF to make low-resolution, rough results for a given camera view. These results can give 3D priors as conditions for the next diffusion process. During denoising diffusion, they improve the 3D consistency even more by modeling cross-view correspondences with a new two-stream (each stream represents a different view) asynchronous diffusion process. This makes the 3D more consistent.

Second, they look at 3D local editing and propose a two-step solution that can change an object from a single view in a way that changes it in all directions. In the first step, they plan to do 2D local editing by mixing the predicted noises. Step 2: they do a process called “noise-to-text inversion,” which maps 2D blended noises into the space for text embedding that doesn’t depend on the view. As soon as the right text embedding is found, 360-degree images can be made.

Last but not least, they extend the model to do one-shot novel view synthesis by fine-tuning a single image. They do this to first show how text guidance can be used for novel view synthesis.

Can an AI create a model in three dimensions?

Text-guided 3D-consistent generation framework (training phase).

(A) NeRF-based Condition Module that takes one low-resolution text and two low-resolution camera views as inputs and makes low-resolution coarse results. The coarse results are shrunk and added to images with noise to set up conditions for denoising. (B) Two-stream Asynchronous Diffusion Module takes one full text, two coarse results, two timesteps, and two noisy images as inputs and predicts the added noises. Except for the feature interaction module after each attention block, each stream is a plain text-driven diffusion model. The timesteps are chosen randomly, and the parameters of these two streams are the same.

A picture of how 3D local editing / inpainting works.

The researchers mix noises at each step of sampling to do 2D local editing and do a noise-to-text inversion to make 3D images that have been changed.

3DDesigner is better than other methods because it uses the proposed NeRF-based Condition Module, Two-stream Asynchronous Diffusion Module, and new diffusion training, sampling, and blending strategies. As a result, it can make highly realistic, detailed, and controlled 3D objects.

Unfortunately, the researchers have not published the code, I suggest to use a similar tool now for free and immediately — HERE

AI is everywhere, But the question is, how much do you love it?

I invite you to explore the concept of Machine Learning Art by reading and learning from the many articles found on 🔵 MLearning.ai 🟠

Check out my instagram with new material every week

Keywords: computer vision, Artificial Intelligence, Machine Learning, AI art, art, wombo dream, digital art, Dalle 2, Imagen, wombo ai, Parti, 3D point cloud, diffusion models, generative art, wombo art, photographic quality, img by AI system, AI art generator, text to art generator, 3D, midjourney, dalle2, stablediffusion, 3D AI designer

https://arxiv.org/pdf/2211.14108.pdf

PROJECT PAGE:

https://arxiv.org/pdf/2211.14108.pdf

TITLE: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models
AUTHORS :Gang Li, Heliang Zheng, Chaoyue Wang, Chang Li, Changwen Zheng, Dacheng Tao.
Ai Art
Artificial Intelligence
Design
Technology
Diffusion Models
Recommended from ReadMedium