Google’s DreamFusion AI Generates 3D Model From Text

AI is progressing fast, way too fast in my opinion. It’s becoming hard to keep up.
It’s only been 6 months since OpenAI released Dall-E2.
It's been less than 2 months since Stability AI released Stable Diffusion.
Just a few days ago, Meta announced a new AI tool that transforms text into video.
And now, Google has announced its new AI model that transforms text into 3D. It is called DreamFusion.

3D artists, be warned.
This technology could turn your world upside down, both in good and bad ways. Today, 3D models are still designed by hand, requiring hours of labor from the human artist using modeling software like Blender or ZBrush.
Text-to-3D generative models and AI tools will soon make it possible for even people without a lot of experience to make high-quality 3D art with just a simple text prompt.
It’s not all doom and gloom for professional artists, though. At least not yet.
Right now, the assets themselves have a low resolution and aren’t ready for commercial use.

How It Works
DreamFusion works by transferring a pre-trained 2D image text-diffusion model to 3D object synthesis.
A text-to-image diffusion model called "Imagen" is used to optimize a 3D scene. The folks from Google also propose the Score Distillation Sampling (SDS) method to optimize samples in an arbitrary parameter (3D) space.

The differentiable mapping is set up with a 3D scene parameterization that works like Neural Radiance Fields, or NeRFs. Even though SDS makes decent-looking scenes, DreamFusion adds more regularizers and optimization strategies to improve the geometry. The trained NeRFs are coherent, with better geometry.
Here are a few more examples from the gallery.

No public access yet
Google did not put any sign-up links on their announcement page. This is because of ethical concerns. DeamFusion uses the Imagen diffusion model, which was trained with the LAION400M dataset, which has images that contain undesirable images.
Generative models, in the hands of bad actors, could be used to generate disinformation. Disinformation in the form of 3D objects may be more convincing than 2D images.
Final Thoughts
DreamFusion is just the beginning. Soon, a more optimized version will be released that can run quickly on your local machine with low VRAM.
This will have a profound impact on how VFX studios create advertising projects or even full-scale movies. Imagine, for example, being able to make a 3D animated film by simply describing the scenes to AI.
This is not just pie in the sky. It is already here.
In education, it could provide more immersive and entertaining learning modules. In medicine, it could allow doctors to “see” inside the human body in ways that are not possible with current imaging technologies. And in business, it could enable companies to create virtual product demonstrations and simulations on demand. The possibilities are endless. And as AI continues to evolve, the potential for text-to-3D technology will only grow.






