Generate 3D models from text
Our model that combines generative 3D with our latest NeRF methods
First install nerfstudio dependencies. Then run:
pip install -e .[gen]
Two options for text to image diffusion models are provided: Stable Diffusion and DeepFloyd IF. We use Deepfloyd IF by default because it trains faster and produces better results. Using this model requires users to sign a license agreement for the model card of DeepFloyd IF, which can be found here. Once the licensed is signed, log into Huggingface locally by running the following command:
If you do not want to sign the license agreement, you can use the Stable Diffusion model (instructions below).
Once installed, run:
ns-train generfacto --prompt "a high quality photo of a pineapple"
The first time you run this method, the diffusion model weights will be downloaded and cached from Hugging Face, which may take a couple minutes.
Specify which diffusion model to use with the diffusion_model flag:
ns-train generfacto --pipeline.model.diffusion_model ["stablediffusion", "deepfloyd"]
The following videos are renders of NeRFs generated from Generfacto. Each model was trained 30k steps, which took around 1 hour with DeepFloyd and around 4 hours with Stable Diffusion.
“a high quality photo of a ripe pineapple” (Stable Diffusion)
“a high quality zoomed out photo of a palm tree” (DeepFloyd)
“a high quality zoomed out photo of a light grey baby shark” (DeepFloyd)