From 3DS max to 3D AI Designer

Summary

Researchers have developed a new text-guided diffusion model for photorealistic 3D object generation and editing, which significantly improves 3D consistency and enables local editing and one-shot novel view synthesis.

Abstract

The article discusses a groundbreaking method for 3D generation and editing using text-guided diffusion models. This approach ensures 3D consistency by combining neural fields with a two-stream asynchronous diffusion process. It also introduces a novel technique for 3D local editing and extends the model for one-shot novel view synthesis. The method represents a significant advancement in the field of 3D modeling, with applications in gaming, entertainment, architecture, and robotics simulation. It demonstrates the potential of AI in creating detailed and controlled 3D objects from textual descriptions, marking a transition from traditional 3D modeling to AI-driven design.

Opinions

The researchers believe that their proposed NeRF-based Condition Module and Two-stream Asynchronous Diffusion Module, along with new diffusion training and sampling strategies, make their 3DDesigner model superior to existing methods.
The article suggests that the transition from 2D to 3D modeling is complex, but the new method simplifies the process by allowing the generation of realistic 3D models from text.
The author invites readers to explore the concept of Machine Learning Art, indicating enthusiasm for the intersection of AI and creativity.
The researchers have not released the code for their method, but they encourage the use of similar tools available for free.
The author expresses a passion for AI, as seen in the invitation to join Medium and follow their work, as well as the promotion of their Instagram and LinkedIn profiles for further collaboration and content.

What kind of work is done in 3D modeling?

Artists create 3D models. They work in film and video production studios, game design, graphic and advertising, web design, software, architecture, product design, or manufacturing. 3D AI Designer is the first text-guided generative model to conduct 3D generation, 3D editing / inpainting , and one-shot view synthesis.

3D inpainting

Text-guided diffusion models are better at making and editing images and videos. Even though there have been few 3D explorations. In this work, the researchers talk about three important and interesting issues on this subject.

First, they use text-guided diffusion models to ensure that the generation is consistent in 3D. In particular, they combine a neural field like NeRF to make low-resolution, rough results for a given camera view. These results can give 3D priors as conditions for the next diffusion process. During denoising diffusion, they improve the 3D consistency even more by modeling cross-view correspondences with a new two-stream (each stream represents a different view) asynchronous diffusion process. This makes the 3D more consistent.

Second, they look at 3D local editing and propose a two-step solution that can change an object from a single view in a way that changes it in all directions. In the first step, they plan to do 2D local editing by mixing the predicted noises. Step 2: they do a process called “noise-to-text inversion,” which maps 2D blended noises into the space for text embedding that doesn’t depend on the view. As soon as the right text embedding is found, 360-degree images can be made.

Last but not least, they extend the model to do one-shot novel view synthesis by fine-tuning a single image. They do this to first show how text guidance can be used for novel view synthesis.

Text-guided 3D-consistent generation framework (training phase).

(A) NeRF-based Condition Module that takes one low-resolution text and two low-resolution camera views as inputs and makes low-resolution coarse results. The coarse results are shrunk and added to images with noise to set up conditions for denoising. (B) Two-stream Asynchronous Diffusion Module takes one full text, two coarse results, two timesteps, and two noisy images as inputs and predicts the added noises. Except for the feature interaction module after each attention block, each stream is a plain text-driven diffusion model. The timesteps are chosen randomly, and the parameters of these two streams are the same.

New 3D generation method — 2×faster

From 3DS max to 3D AI Designer

Diffusion models for 3D generation and inpainting

What kind of work is done in 3D modeling?

Next step in text-to-3D

New 3D generation method - 2×faster

Gaming, entertainment, architecture, and robotics simulation use 3D digital content. It's spreading to shopping…

3D inpainting

Text-to-3D Generation

Can AI create 3D models?

A novel model that learns 3D from a single photo

The power of combining several new improvements

Can an AI create a model in three dimensions?

How to make 3D models from a single image

New AI method to generate AR/VR scenes

Text-guided 3D-consistent generation framework (training phase).

A picture of how 3D local editing / inpainting works.

Generate realistic 3D models from text

Most researchers agree that the problem of reconstructing 3D from 2D observations is complex. The transition from 2D to…

AI is everywhere, But the question is, how much do you love it?

Join Medium with my referral link - Dariusz Gross #DATAsculptor

AI is everywhere 🟠 But the question is, how much do you love it? Join the Medium Membership to enjoy every story! Your…

PROJECT PAGE:

The true pixel

Most generative tasks, notably image-to-image tasks like super-resolution and inpainting, use diffusion models…