Summary

Google's DreamFusion AI represents a significant leap in AI capabilities by generating 3D models from textual descriptions, potentially revolutionizing 3D art creation and various industries.

Abstract

Google has introduced DreamFusion, an AI model that can create 3D objects from text prompts, marking a rapid advancement in AI technology. This development follows closely after other AI breakthroughs, such as OpenAI's Dall-E 2 and Stability AI's Stable Diffusion, as well as Meta's text-to-video AI tool. DreamFusion utilizes a 2D image text-diffusion model transferred to 3D object synthesis, employing Score Distillation Sampling (SDS) and differentiable mapping with Neural Radiance Fields (NeRFs) to produce coherent 3D scenes. While the technology is promising, it currently generates low-resolution assets unsuitable for commercial use. The implications for 3D artists and industries like VFX, education, medicine, and business are profound, with the potential to democratize 3D art creation and enhance various applications through more immersive and interactive content.

Opinions

The author suggests that AI technology, including Google's DreamFusion, is advancing at an alarming rate, making it challenging to keep pace.
There is a concern that generative models like DreamFusion could be misused by bad actors to create disinformation, which might be more convincing in 3D form.
The author believes that DreamFusion could disrupt the 3D art industry, enabling people with little experience to create high-quality 3D art through simple text prompts.
Despite the potential disruption, the author notes that professional artists may still find opportunities, as the current quality of AI-generated assets is not yet suitable for commercial use.
The author is optimistic about the future applications of DreamFusion, envisioning its use in VFX for movies, interactive learning modules in education, advanced medical imaging, and on-demand virtual product demonstrations in business.

Google’s DreamFusion AI Generates 3D Model From Text

AI is progressing fast, way too fast in my opinion. It’s becoming hard to keep up.

It’s only been 6 months since OpenAI released Dall-E2.

It's been less than 2 months since Stability AI released Stable Diffusion.

Just a few days ago, Meta announced a new AI tool that transforms text into video.

And now, Google has announced its new AI model that transforms text into 3D. It is called DreamFusion.

3D artists, be warned.

This technology could turn your world upside down, both in good and bad ways. Today, 3D models are still designed by hand, requiring hours of labor from the human artist using modeling software like Blender or ZBrush.

Text-to-3D generative models and AI tools will soon make it possible for even people without a lot of experience to make high-quality 3D art with just a simple text prompt.

It’s not all doom and gloom for professional artists, though. At least not yet.

Right now, the assets themselves have a low resolution and aren’t ready for commercial use.

How It Works

DreamFusion works by transferring a pre-trained 2D image text-diffusion model to 3D object synthesis.

A text-to-image diffusion model called "Imagen" is used to optimize a 3D scene. The folks from Google also propose the Score Distillation Sampling (SDS) method to optimize samples in an arbitrary parameter (3D) space.

Image from Google Research and UC Berkeley paper

The differentiable mapping is set up with a 3D scene parameterization that works like Neural Radiance Fields, or NeRFs. Even though SDS makes decent-looking scenes, DreamFusion adds more regularizers and optimization strategies to improve the geometry. The trained NeRFs are coherent, with better geometry.

Here are a few more examples from the gallery.

No public access yet

Google did not put any sign-up links on their announcement page. This is because of ethical concerns. DeamFusion uses the Imagen diffusion model, which was trained with the LAION400M dataset, which has images that contain undesirable images.

Generative models, in the hands of bad actors, could be used to generate disinformation. Disinformation in the form of 3D objects may be more convincing than 2D images.

Final Thoughts

DreamFusion is just the beginning. Soon, a more optimized version will be released that can run quickly on your local machine with low VRAM.

This will have a profound impact on how VFX studios create advertising projects or even full-scale movies. Imagine, for example, being able to make a 3D animated film by simply describing the scenes to AI.

This is not just pie in the sky. It is already here.

In education, it could provide more immersive and entertaining learning modules. In medicine, it could allow doctors to “see” inside the human body in ways that are not possible with current imaging technologies. And in business, it could enable companies to create virtual product demonstrations and simulations on demand. The possibilities are endless. And as AI continues to evolve, the potential for text-to-3D technology will only grow.