Get Started with PyTorch3D in 4 Minutes with Google Colab
AI for 3D applications will be the next big thing. We think so and so, apparently, do Facebook who have just released a new add on for their open source deep learning framework PyTorch: the astonishingly named PyTorch3D. It’s surprisingly easy to get started with it, let me show you how.
There are a whole bunch of ways to get involved, have a play, and experiment.
If, like most people, you have yet to pick up a $4000 graphics card you may feel somewhat left out of this frontier. But fortunately, there are a whole bunch of ways to get involved, have a play, and experiment. You may find something exciting! All you need is a computer with an internet connection and a Google account.
I’ll walk you through the process of setting up a Google Colab notebook, installing PyTorch3D in it and running you through a bit of a tutorial to explain some of the things that are going on.
At Kaedim we’re using these same tools to develop and test our product, an AI tool to help digital 3D designers do more of what they love by getting the computer to do the boring stuff.
Step 1: Creating a notebook
Follow the process in this tutorial to get up and running with a Google Colab Python 3 notebook with a GPU!
Step 2: Installing PyTorch3D
Now that you have a notebook running, it’s super simple to get PyTorch3D installed and running. Simply type:
!pip install ‘git+https://github.com/facebookresearch/pytorch3d.git'
and hit the run button!
The notebook will now begin downloading and installing the framework so that it’s accessible for you to use.
Once it’s done, create a new code cell and run:
import pytorch3d as p3d
print(p3d.__version__)
This should print out the current version of PyTorch3D that you have in your notebook.
Congrats! You now have everything installed and ready to use!
Step 3: Doing Cool Stuff
The framework comes with a whole host of fun things to have a play around with. From their Github page you can find a list of features and tutorials, and I’ll list and explain a few here.
Data structure for storing and manipulating triangle meshes
Meshes are pretty hard to interact with. In contrast to images, which have a nicely ordered structure in the form of a grid of pixels, meshes are much more complex and difficult to deal with. Each individual vertex can connect to multiple different vertices via a number of different edges. Different meshes, though similar looking, can have a wildly different underlying structure!
PyTorch3d helps to simplify the loading and manipulation of 3D meshes with some inbuilt data structures to take the pain out of wrapping your head around how to do it. Instead of 300 lines of code, you only need one:
verts, faces, aux = load_obj('something.obj')
While limited to .obj files for now, it’s likely that more mesh types will be added in future.
Efficient operations on triangle meshes
Once you’ve loaded your mesh, the next problem is how to do things with it. Given the complexity of the data structure, having to write out methods to perform loss calculations (essential for any machine learning problem), perform sampling or transformation or even performing convolutions is pretty tough.
Fortunately, these operations come bundled in with PyTorch3D. For example, one way to measure how similar two meshes are to each other is to sample a number of points from the surface of each mesh, and measure how similar the corresponding point on another mesh is!
You can do this in a few lines of code:
#Sample 5k points from both the target and source mesh
sample_trg = sample_points_from_meshes(trg_mesh, 5000)
sample_src = sample_points_from_meshes(new_src_mesh, 5000)
#Compare the two sets of point clouds with e.g. the chamfer loss
loss_chamfer, _ = chamfer_distance(sample_trg, sample_src)
A differentiable mesh renderer
Now finally, supposing you would like to take a look at what your mesh looks like within the notebook, PyTorch3D comes with a renderer that can display your meshes, complete with textures if that’s desired!
By setting the camera, renderer and lighting you can produce a rendered image of your mesh in a small chunk of code.
First you set up the camera:
R, T = look_at_view_transform(2.7, 10, 20)
cameras = OpenGLPerspectiveCameras(device=device, R=R, T=T)
Then you define how the image will be displayed:
raster_settings = RasterizationSettings(
image_size = 512,
blur_radius. = 0.0,
faces_per_pixel = 1,
bin_size. = 0
)
Place down a light:
lights = PointLights(device=device, location=[[1.0, 1.0, -2.0]])
Define how the renderer will work, in this case a textured phong renderer which is made my composing a rasteriser and a shader together:
renderer = MeshRenderer(
rasterizer = MeshRasterizer(
cameras = cameras,
raster_settings = raster_settings
),
shader = TexturedPhongShader(
device = device,
cameras = cameras,
lights = lights
)
)
And finally, putting it all together to get a lovely picture of your mesh!
images = renderer(mesh)
plt.figure(figsize=(10, 10))
plt.imshow(images[0, ..., :3].cpu().numpy())
plt.grid("off")
plt.axis("off")
Step 4: Over to You
Now that you know the basics of how to set everything up in Google Colab, you’ve got all the tools you need to start having a play yourself! There’s a bunch of resources on the Github page for the framework, and it’s worth completing a few of the tutorials they have to get a more complete picture.
We hope you’ve found this useful, and we’ll keep you updated on any further developments in the 3D AI field! Check out what we’re doing at Kaedim.com!