Generate 3D models from text

Our model that combines generative 3D with our latest NeRF methods


First install nerfstudio dependencies. Then run:

pip install -e .[gen]

Two options for text to image diffusion models are provided: Stable Diffusion and DeepFloyd IF. We use Deepfloyd IF by default because it trains faster and produces better results. Using this model requires users to sign a license agreement for the model card of DeepFloyd IF, which can be found here. Once the licensed is signed, log into Huggingface locally by running the following command:

huggingface-cli login

If you do not want to sign the license agreement, you can use the Stable Diffusion model (instructions below).

Running Generfacto#

Once installed, run:

ns-train generfacto --prompt "a high quality photo of a pineapple"

The first time you run this method, the diffusion model weights will be downloaded and cached from Hugging Face, which may take a couple minutes.

Specify which diffusion model to use with the diffusion_model flag:

ns-train generfacto --pipeline.model.diffusion_model ["stablediffusion", "deepfloyd"]

Example Results#

The following videos are renders of NeRFs generated from Generfacto. Each model was trained 30k steps, which took around 1 hour with DeepFloyd and around 4 hours with Stable Diffusion.

“a high quality photo of a ripe pineapple” (Stable Diffusion)

“a high quality zoomed out photo of a palm tree” (DeepFloyd)

“a high quality zoomed out photo of a light grey baby shark” (DeepFloyd)