Spatial Distortions#

If you are trying to reconstruct an object floating in an empty void, you can stop reading. However if you are trying to reconstruct a scene or object from images, you may wish to consider adding a spatial distortion.

When rendering a target view of a scene, the camera will emit a camera ray for each pixel and query the scene at points along this ray. We can choose where to query these points using different samplers. These samplers have some notion of bounds that define where the ray should start and terminate. If you know that everything in your scenes exists within some predefined bounds (ie. a cube that a room fits in) then the sampler will properly sample the entire space. If however the scene is unbounded (ie. an outdoor scene) defining where to stop sampling is challenging. One option to increase the far sampling distance to a large value (ie. 1km). Alternatively we can warp the space into a fixed volume. Below are supported distortions.

Scene Contraction#

Contract unbounded space into a ball of radius 2. This contraction was proposed in MipNeRF-360. Samples within the unit ball are not modified, whereas sample outside the unit ball are contracted to fit within the ball of radius 2.

We use the following contraction equation:

\[\begin{split} f(x) = \begin{cases} x & ||x|| \leq 1 \\ (2 - \frac{1}{||x||})(\frac{x}{||x||}) & ||x|| > 1 \end{cases} \end{split}\]

Below we visualize a ray before and after scene contraction. Visualized are 95% confidence intervals for the multivariate Gaussians for each sample location (this guide explains why the samples are represented by Gaussians). We are also visualizing both a unit sphere and a radius 2 sphere.

Before Contraction#

# COLLAPSED
import plotly.graph_objects as go
import torch

import nerfstudio
from nerfstudio.utils import plotly as vis

num_rays = 4
num_samples = 15
pixel_area = 0.12
far = 5

origins = torch.rand((num_rays, 1, 3)) * 1 - 0.5
directions = torch.randn((num_rays, 1, 3))
bins = torch.linspace(0.1, far, num_samples + 1)[None, ..., None]
pixel_area = torch.tensor([pixel_area])[None, None, ...]

directions = directions / directions.norm(dim=-1, keepdim=True)

frustums = nerfstudio.cameras.rays.Frustums(
    origins=origins, directions=directions, starts=bins[:, :-1, :], ends=bins[:, 1:, :], pixel_area=pixel_area
)

data = []
for i, frustum in enumerate(frustums):
    data += vis.get_gaussian_ellipsoids_list(frustum.get_gaussian_blob(), color=vis.get_random_color(idx=i))
data.append(vis.get_sphere(radius=1.0, color="#111111", opacity=0.05))
data.append(vis.get_sphere(radius=2.0, color="#111111", opacity=0.05))

fig = go.Figure(data=data, layout=webdocs_layout)
fig.show()

After Contraction#

Using a \(L_2\) norm

# COLLAPSED
from nerfstudio.field_components.spatial_distortions import SceneContraction

data = []
for i, frustum in enumerate(frustums):
    contracted_gaussian = SceneContraction()(frustum.get_gaussian_blob())
    data += vis.get_gaussian_ellipsoids_list(contracted_gaussian, color=vis.get_random_color(idx=i))

data.append(vis.get_sphere(radius=1.0, color="#111111", opacity=0.05))
data.append(vis.get_sphere(radius=2.0, color="#111111", opacity=0.05))

fig = go.Figure(data=data, layout=webdocs_layout)
fig.show()

Using \(L_{\infty}\) norm will bound the space to a cube of side length 4. This can be useful if downstream encoders opperated on a grid (ie. the hash encoder using in Instant-NGP).

# COLLAPSED
from nerfstudio.field_components.spatial_distortions import SceneContraction

data = []
for i, frustum in enumerate(frustums):
    contracted_gaussian = SceneContraction(order=float("inf"))(frustum.get_gaussian_blob())
    data += vis.get_gaussian_ellipsoids_list(contracted_gaussian, color=vis.get_random_color(idx=i))

data.append(vis.get_cube(side_length=2.0, color="#111111", opacity=0.05))
data.append(vis.get_cube(side_length=4.0, color="#111111", opacity=0.05))

fig = go.Figure(data=data, layout=webdocs_layout)
fig.show()