Encodings#
Encoding functions
- class nerfstudio.field_components.encodings.Encoding(in_dim: int)[source]#
Bases:
FieldComponent
Encode an input tensor. Intended to be subclassed
- Parameters:
in_dim – Input dimension of tensor
- class nerfstudio.field_components.encodings.FFEncoding(in_dim: int, basis: Float[Tensor, 'M N'], num_frequencies: int, min_freq_exp: float, max_freq_exp: float, include_input: bool = False)[source]#
Bases:
Encoding
Fourier Feature encoding. Supports integrated encodings.
- Parameters:
in_dim – Input dimension of tensor
basis – Basis matrix from which to construct the Fourier features.
num_frequencies – Number of encoded frequencies per axis
min_freq_exp – Minimum frequency exponent
max_freq_exp – Maximum frequency exponent
include_input – Append the input coordinate to the encoding
- forward(in_tensor: Float[Tensor, '*bs input_dim'], covs: Optional[Float[Tensor, '*bs input_dim input_dim']] = None) Float[Tensor, '*bs output_dim'] [source]#
- Calculates FF encoding. If covariances are provided the encodings will be integrated as proposed
in mip-NeRF.
- Parameters:
in_tensor – For best performance, the input tensor should be between 0 and 1.
covs – Covariances of input points.
- Returns:
Output values will be between -1 and 1
- class nerfstudio.field_components.encodings.HashEncoding(num_levels: int = 16, min_res: int = 16, max_res: int = 1024, log2_hashmap_size: int = 19, features_per_level: int = 2, hash_init_scale: float = 0.001, implementation: Literal['tcnn', 'torch'] = 'tcnn', interpolation: Optional[Literal['Nearest', 'Linear', 'Smoothstep']] = None)[source]#
Bases:
Encoding
Hash encoding
- Parameters:
num_levels – Number of feature grids.
min_res – Resolution of smallest feature grid.
max_res – Resolution of largest feature grid.
log2_hashmap_size – Size of hash map is 2^log2_hashmap_size.
features_per_level – Number of features per level.
hash_init_scale – Value to initialize hash grid.
implementation – Implementation of hash encoding. Fallback to torch if tcnn not available.
interpolation – Interpolation override for tcnn hashgrid. Not supported for torch unless linear.
- forward(in_tensor: Float[Tensor, '*bs input_dim']) Float[Tensor, '*bs output_dim'] [source]#
Call forward and returns and processed tensor
- Parameters:
in_tensor – the input tensor to process
- classmethod get_tcnn_encoding_config(num_levels, features_per_level, log2_hashmap_size, min_res, growth_factor, interpolation=None) dict [source]#
Get the encoding configuration for tcnn if implemented
- class nerfstudio.field_components.encodings.Identity(in_dim: int)[source]#
Bases:
Encoding
Identity encoding (Does not modify input)
- class nerfstudio.field_components.encodings.KPlanesEncoding(resolution: Sequence[int] = (128, 128, 128), num_components: int = 64, init_a: float = 0.1, init_b: float = 0.5, reduce: Literal['sum', 'product'] = 'product')[source]#
Bases:
Encoding
Learned K-Planes encoding
A plane encoding supporting both 3D and 4D coordinates. With 3D coordinates this is similar to
TriplaneEncoding
. With 4D coordinates, the encoding at point[i,j,k,q]
is a n-dimensional vector computed as the elementwise product of 6 n-dimensional vectors atplanes[i,j]
,planes[i,k]
,planes[i,q]
,planes[j,k]
,planes[j,q]
,planes[k,q]
.Unlike
TriplaneEncoding
this class supports different resolution along each axis.This will return a tensor of shape (bs:…, num_components)
- Parameters:
resolution – Resolution of the grid. Can be a sequence of 3 or 4 integers.
num_components – The number of scalar planes to use (ie: output feature size)
init_a – The lower-bound of the uniform distribution used to initialize the spatial planes
init_b – The upper-bound of the uniform distribution used to initialize the spatial planes
reduce – Whether to use the element-wise product of the planes or the sum
- class nerfstudio.field_components.encodings.NeRFEncoding(in_dim: int, num_frequencies: int, min_freq_exp: float, max_freq_exp: float, include_input: bool = False, implementation: Literal['tcnn', 'torch'] = 'torch')[source]#
Bases:
Encoding
Multi-scale sinusoidal encodings. Support
integrated positional encodings
if covariances are provided. Each axis is encoded with frequencies ranging from 2^min_freq_exp to 2^max_freq_exp.- Parameters:
in_dim – Input dimension of tensor
num_frequencies – Number of encoded frequencies per axis
min_freq_exp – Minimum frequency exponent
max_freq_exp – Maximum frequency exponent
include_input – Append the input coordinate to the encoding
- forward(in_tensor: Float[Tensor, '*bs input_dim'], covs: Optional[Float[Tensor, '*bs input_dim input_dim']] = None) Float[Tensor, '*bs output_dim'] [source]#
Call forward and returns and processed tensor
- Parameters:
in_tensor – the input tensor to process
- classmethod get_tcnn_encoding_config(num_frequencies) dict [source]#
Get the encoding configuration for tcnn if implemented
- pytorch_fwd(in_tensor: Float[Tensor, '*bs input_dim'], covs: Optional[Float[Tensor, '*bs input_dim input_dim']] = None) Float[Tensor, '*bs output_dim'] [source]#
- Calculates NeRF encoding. If covariances are provided the encodings will be integrated as proposed
in mip-NeRF.
- Parameters:
in_tensor – For best performance, the input tensor should be between 0 and 1.
covs – Covariances of input points.
- Returns:
Output values will be between -1 and 1
- class nerfstudio.field_components.encodings.PolyhedronFFEncoding(num_frequencies: int, min_freq_exp: float, max_freq_exp: float, basis_shape: Literal['octahedron', 'icosahedron'] = 'octahedron', basis_subdivisions: int = 1, include_input: bool = False)[source]#
Bases:
FFEncoding
Fourier Feature encoding using polyhedron basis as proposed by mip-NeRF360. Supports integrated encodings.
- Parameters:
num_frequencies – Number of encoded frequencies per axis
min_freq_exp – Minimum frequency exponent
max_freq_exp – Maximum frequency exponent
basis_shape – Shape of polyhedron basis. Either “octahedron” or “icosahedron”
basis_subdivisions – Number of times to tesselate the polyhedron.
include_input – Append the input coordinate to the encoding
- class nerfstudio.field_components.encodings.RFFEncoding(in_dim: int, num_frequencies: int, scale: float, include_input: bool = False)[source]#
Bases:
FFEncoding
Random Fourier Feature encoding. Supports integrated encodings.
- Parameters:
in_dim – Input dimension of tensor
num_frequencies – Number of encoding frequencies
scale – Std of Gaussian to sample frequencies. Must be greater than zero
include_input – Append the input coordinate to the encoding
- class nerfstudio.field_components.encodings.SHEncoding(levels: int = 4, implementation: Literal['tcnn', 'torch'] = 'torch')[source]#
Bases:
Encoding
Spherical harmonic encoding
- Parameters:
levels – Number of spherical harmonic levels to encode.
- forward(in_tensor: Float[Tensor, '*bs input_dim']) Float[Tensor, '*bs output_dim'] [source]#
Call forward and returns and processed tensor
- Parameters:
in_tensor – the input tensor to process
- class nerfstudio.field_components.encodings.ScalingAndOffset(in_dim: int, scaling: float = 1.0, offset: float = 0.0)[source]#
Bases:
Encoding
Simple scaling and offset to input
- Parameters:
in_dim – Input dimension of tensor
scaling – Scaling applied to tensor.
offset – Offset applied to tensor.
- class nerfstudio.field_components.encodings.TensorCPEncoding(resolution: int = 256, num_components: int = 24, init_scale: float = 0.1)[source]#
Bases:
Encoding
Learned CANDECOMP/PARFAC (CP) decomposition encoding used in TensoRF
- Parameters:
resolution – Resolution of grid.
num_components – Number of components per dimension.
init_scale – Initialization scale.
- class nerfstudio.field_components.encodings.TensorVMEncoding(resolution: int = 128, num_components: int = 24, init_scale: float = 0.1)[source]#
Bases:
Encoding
Learned vector-matrix encoding proposed by TensoRF
- Parameters:
resolution – Resolution of grid.
num_components – Number of components per dimension.
init_scale – Initialization scale.
- class nerfstudio.field_components.encodings.TriplaneEncoding(resolution: int = 32, num_components: int = 64, init_scale: float = 0.1, reduce: Literal['sum', 'product'] = 'sum')[source]#
Bases:
Encoding
Learned triplane encoding
The encoding at [i,j,k] is an n dimensional vector corresponding to the element-wise product of the three n dimensional vectors at plane_coeff[i,j], plane_coeff[i,k], and plane_coeff[j,k].
This allows for marginally more expressivity than the TensorVMEncoding, and each component is self standing and symmetrical, unlike with VM decomposition where we needed one component with a vector along all the x, y, z directions for symmetry.
This can be thought of as 3 planes of features perpendicular to the x, y, and z axes, respectively and intersecting at the origin, and the encoding being the element-wise product of the element at the projection of [i, j, k] on these planes.
The use for this is in representing a tensor decomp of a 4D embedding tensor: (x, y, z, feature_size)
This will return a tensor of shape (bs:…, num_components)
- Parameters:
resolution – Resolution of grid.
num_components – The number of scalar triplanes to use (ie: output feature size)
init_scale – The scale of the initial values of the planes
product – Whether to use the element-wise product of the planes or the sum