Encodings#

Encoding functions

class nerfstudio.field_components.encodings.Encoding(in_dim: int)[source]#

Bases: FieldComponent

Encode an input tensor. Intended to be subclassed

Parameters:: in_dim – Input dimension of tensor

abstract forward(in_tensor: Shaped[Tensor, '*bs input_dim']) → Shaped[Tensor, '*bs output_dim'][source]#

Call forward and returns and processed tensor

Parameters:: in_tensor – the input tensor to process

class nerfstudio.field_components.encodings.FFEncoding(in_dim: int, basis: Float[Tensor, 'M N'], num_frequencies: int, min_freq_exp: float, max_freq_exp: float, include_input: bool = False)[source]#

Bases: Encoding

Fourier Feature encoding. Supports integrated encodings.

Parameters:

in_dim – Input dimension of tensor
basis – Basis matrix from which to construct the Fourier features.
num_frequencies – Number of encoded frequencies per axis
min_freq_exp – Minimum frequency exponent
max_freq_exp – Maximum frequency exponent
include_input – Append the input coordinate to the encoding

forward(in_tensor: Float[Tensor, '*bs input_dim'], covs: Optional[Float[Tensor, '*bs input_dim input_dim']] = None) → Float[Tensor, '*bs output_dim'][source]#

Calculates FF encoding. If covariances are provided the encodings will be integrated as proposed: in mip-NeRF.

Parameters:

in_tensor – For best performance, the input tensor should be between 0 and 1.
covs – Covariances of input points.

Returns:

Output values will be between -1 and 1

get_out_dim() → int[source]#: Calculates output dimension of encoding.

class nerfstudio.field_components.encodings.HashEncoding(num_levels: int = 16, min_res: int = 16, max_res: int = 1024, log2_hashmap_size: int = 19, features_per_level: int = 2, hash_init_scale: float = 0.001, implementation: Literal['tcnn', 'torch'] = 'tcnn', interpolation: Optional[Literal['Nearest', 'Linear', 'Smoothstep']] = None)[source]#

Bases: Encoding

Hash encoding

Parameters:

num_levels – Number of feature grids.
min_res – Resolution of smallest feature grid.
max_res – Resolution of largest feature grid.
log2_hashmap_size – Size of hash map is 2^log2_hashmap_size.
features_per_level – Number of features per level.
hash_init_scale – Value to initialize hash grid.
implementation – Implementation of hash encoding. Fallback to torch if tcnn not available.
interpolation – Interpolation override for tcnn hashgrid. Not supported for torch unless linear.

build_nn_modules() → None[source]#: Initialize the torch version of the hash encoding.

forward(in_tensor: Float[Tensor, '*bs input_dim']) → Float[Tensor, '*bs output_dim'][source]#

Call forward and returns and processed tensor

Parameters:: in_tensor – the input tensor to process

get_out_dim() → int[source]#: Calculates output dimension of encoding.

classmethod get_tcnn_encoding_config(num_levels, features_per_level, log2_hashmap_size, min_res, growth_factor, interpolation=None) → dict[source]#: Get the encoding configuration for tcnn if implemented

hash_fn(in_tensor: Int[Tensor, '*bs num_levels 3']) → Shaped[Tensor, '*bs num_levels'][source]#

Returns hash tensor using method described in Instant-NGP

Parameters:: in_tensor – Tensor to be hashed

pytorch_fwd(in_tensor: Float[Tensor, '*bs input_dim']) → Float[Tensor, '*bs output_dim'][source]#: Forward pass using pytorch. Significantly slower than TCNN implementation.

class nerfstudio.field_components.encodings.Identity(in_dim: int)[source]#

Bases: Encoding

Identity encoding (Does not modify input)

forward(in_tensor: Shaped[Tensor, '*bs input_dim']) → Shaped[Tensor, '*bs output_dim'][source]#

Call forward and returns and processed tensor

Parameters:: in_tensor – the input tensor to process

get_out_dim() → int[source]#: Calculates output dimension of encoding.

class nerfstudio.field_components.encodings.KPlanesEncoding(resolution: Sequence[int] = (128, 128, 128), num_components: int = 64, init_a: float = 0.1, init_b: float = 0.5, reduce: Literal['sum', 'product'] = 'product')[source]#

Bases: Encoding

Learned K-Planes encoding

A plane encoding supporting both 3D and 4D coordinates. With 3D coordinates this is similar to TriplaneEncoding. With 4D coordinates, the encoding at point [i,j,k,q] is a n-dimensional vector computed as the elementwise product of 6 n-dimensional vectors at planes[i,j], planes[i,k], planes[i,q], planes[j,k], planes[j,q], planes[k,q].

Unlike TriplaneEncoding this class supports different resolution along each axis.

This will return a tensor of shape (bs:…, num_components)

Parameters:

resolution – Resolution of the grid. Can be a sequence of 3 or 4 integers.
num_components – The number of scalar planes to use (ie: output feature size)
init_a – The lower-bound of the uniform distribution used to initialize the spatial planes
init_b – The upper-bound of the uniform distribution used to initialize the spatial planes
reduce – Whether to use the element-wise product of the planes or the sum

forward(in_tensor: Float[Tensor, '*bs input_dim']) → Float[Tensor, '*bs output_dim'][source]#: Sample features from this encoder. Expects in_tensor to be in range [-1, 1]

get_out_dim() → int[source]#: Calculates output dimension of encoding.

class nerfstudio.field_components.encodings.NeRFEncoding(in_dim: int, num_frequencies: int, min_freq_exp: float, max_freq_exp: float, include_input: bool = False, implementation: Literal['tcnn', 'torch'] = 'torch')[source]#

Bases: Encoding

Multi-scale sinusoidal encodings. Support integrated positional encodings if covariances are provided. Each axis is encoded with frequencies ranging from 2^min_freq_exp to 2^max_freq_exp.

Parameters:

in_dim – Input dimension of tensor
num_frequencies – Number of encoded frequencies per axis
min_freq_exp – Minimum frequency exponent
max_freq_exp – Maximum frequency exponent
include_input – Append the input coordinate to the encoding

forward(in_tensor: Float[Tensor, '*bs input_dim'], covs: Optional[Float[Tensor, '*bs input_dim input_dim']] = None) → Float[Tensor, '*bs output_dim'][source]#

Call forward and returns and processed tensor

Parameters:: in_tensor – the input tensor to process

get_out_dim() → int[source]#: Calculates output dimension of encoding.

classmethod get_tcnn_encoding_config(num_frequencies) → dict[source]#: Get the encoding configuration for tcnn if implemented

pytorch_fwd(in_tensor: Float[Tensor, '*bs input_dim'], covs: Optional[Float[Tensor, '*bs input_dim input_dim']] = None) → Float[Tensor, '*bs output_dim'][source]#

Calculates NeRF encoding. If covariances are provided the encodings will be integrated as proposed: in mip-NeRF.

Parameters:

in_tensor – For best performance, the input tensor should be between 0 and 1.
covs – Covariances of input points.

Returns:

Output values will be between -1 and 1

class nerfstudio.field_components.encodings.PolyhedronFFEncoding(num_frequencies: int, min_freq_exp: float, max_freq_exp: float, basis_shape: Literal['octahedron', 'icosahedron'] = 'octahedron', basis_subdivisions: int = 1, include_input: bool = False)[source]#

Bases: FFEncoding

Fourier Feature encoding using polyhedron basis as proposed by mip-NeRF360. Supports integrated encodings.

Parameters:

num_frequencies – Number of encoded frequencies per axis
min_freq_exp – Minimum frequency exponent
max_freq_exp – Maximum frequency exponent
basis_shape – Shape of polyhedron basis. Either “octahedron” or “icosahedron”
basis_subdivisions – Number of times to tesselate the polyhedron.
include_input – Append the input coordinate to the encoding

class nerfstudio.field_components.encodings.RFFEncoding(in_dim: int, num_frequencies: int, scale: float, include_input: bool = False)[source]#

Bases: FFEncoding

Random Fourier Feature encoding. Supports integrated encodings.

Parameters:

in_dim – Input dimension of tensor
num_frequencies – Number of encoding frequencies
scale – Std of Gaussian to sample frequencies. Must be greater than zero
include_input – Append the input coordinate to the encoding

class nerfstudio.field_components.encodings.SHEncoding(levels: int = 4, implementation: Literal['tcnn', 'torch'] = 'torch')[source]#

Bases: Encoding

Spherical harmonic encoding

Parameters:: levels – Number of spherical harmonic levels to encode. (level = sh degree + 1)

forward(in_tensor: Float[Tensor, '*bs input_dim']) → Float[Tensor, '*bs output_dim'][source]#

Call forward and returns and processed tensor

Parameters:: in_tensor – the input tensor to process

get_out_dim() → int[source]#: Calculates output dimension of encoding.

classmethod get_tcnn_encoding_config(levels: int) → dict[source]#: Get the encoding configuration for tcnn if implemented

pytorch_fwd(in_tensor: Float[Tensor, '*bs input_dim']) → Float[Tensor, '*bs output_dim'][source]#: Forward pass using pytorch. Significantly slower than TCNN implementation.

class nerfstudio.field_components.encodings.ScalingAndOffset(in_dim: int, scaling: float = 1.0, offset: float = 0.0)[source]#

Bases: Encoding

Simple scaling and offset to input

Parameters:

in_dim – Input dimension of tensor
scaling – Scaling applied to tensor.
offset – Offset applied to tensor.

forward(in_tensor: Float[Tensor, '*bs input_dim']) → Float[Tensor, '*bs output_dim'][source]#

Call forward and returns and processed tensor

Parameters:: in_tensor – the input tensor to process

get_out_dim() → int[source]#: Calculates output dimension of encoding.

class nerfstudio.field_components.encodings.TensorCPEncoding(resolution: int = 256, num_components: int = 24, init_scale: float = 0.1)[source]#

Bases: Encoding

Learned CANDECOMP/PARFAC (CP) decomposition encoding used in TensoRF

Parameters:

resolution – Resolution of grid.
num_components – Number of components per dimension.
init_scale – Initialization scale.

forward(in_tensor: Float[Tensor, '*bs input_dim']) → Float[Tensor, '*bs output_dim'][source]#

Call forward and returns and processed tensor

Parameters:: in_tensor – the input tensor to process

get_out_dim() → int[source]#: Calculates output dimension of encoding.

upsample_grid(resolution: int) → None[source]#

Upsamples underyling feature grid

Parameters:: resolution – Target resolution.

class nerfstudio.field_components.encodings.TensorVMEncoding(resolution: int = 128, num_components: int = 24, init_scale: float = 0.1)[source]#

Bases: Encoding

Learned vector-matrix encoding proposed by TensoRF

Parameters:

resolution – Resolution of grid.
num_components – Number of components per dimension.
init_scale – Initialization scale.

forward(in_tensor: Float[Tensor, '*bs input_dim']) → Float[Tensor, '*bs output_dim'][source]#

Compute encoding for each position in in_positions

Parameters:: in_tensor – position inside bounds in range [-1,1],

Returns: Encoded position

get_out_dim() → int[source]#: Calculates output dimension of encoding.

upsample_grid(resolution: int) → None[source]#

Upsamples underlying feature grid

Parameters:: resolution – Target resolution.

class nerfstudio.field_components.encodings.TriplaneEncoding(resolution: int = 32, num_components: int = 64, init_scale: float = 0.1, reduce: Literal['sum', 'product'] = 'sum')[source]#

Bases: Encoding

Learned triplane encoding

The encoding at [i,j,k] is an n dimensional vector corresponding to the element-wise product of the three n dimensional vectors at plane_coeff[i,j], plane_coeff[i,k], and plane_coeff[j,k].

This allows for marginally more expressivity than the TensorVMEncoding, and each component is self standing and symmetrical, unlike with VM decomposition where we needed one component with a vector along all the x, y, z directions for symmetry.

This can be thought of as 3 planes of features perpendicular to the x, y, and z axes, respectively and intersecting at the origin, and the encoding being the element-wise product of the element at the projection of [i, j, k] on these planes.

The use for this is in representing a tensor decomp of a 4D embedding tensor: (x, y, z, feature_size)

This will return a tensor of shape (bs:…, num_components)

Parameters:

resolution – Resolution of grid.
num_components – The number of scalar triplanes to use (ie: output feature size)
init_scale – The scale of the initial values of the planes
product – Whether to use the element-wise product of the planes or the sum

forward(in_tensor: Float[Tensor, '*bs 3']) → Float[Tensor, '*bs num_components featuresize'][source]#: Sample features from this encoder. Expects in_tensor to be in range [0, resolution]

get_out_dim() → int[source]#: Calculates output dimension of encoding.

upsample_grid(resolution: int) → None[source]#

Upsamples underlying feature grid

Parameters:: resolution – Target resolution.