ns-process-data#

Note

Make sure to have COLMAP and FFmpeg installed.
You may also want to install hloc (optional) for more feature detector and matcher options.

usage: ns-process-data [-h]
                       {images,video,polycam,metashape,realitycapture,insta360
,record3d}

subcommands#

{images,video,polycam,metashape,realitycapture,insta360,record3d}

Possible choices: images, video, polycam, metashape, realitycapture, insta360, record3d

Sub-commands:#

images#

Process images into a nerfstudio dataset. This script does the following:

  1. Scales images to a specified size.

  2. Calculates the camera poses for each image using COLMAP.

ns-process-data images [-h] --data PATH --output-dir PATH
                       [--camera-type {perspective,fisheye}]
                       [--matching-method {exhaustive,sequential,vocab_tree}]
                       [--sfm-tool {any,colmap,hloc}]
                       [--feature-type 
{any,sift,superpoint,superpoint_aachen,superpoint_max,superpoint_inloc,r2d2,d2
net-ss,sosnet,disk}]
                       [--matcher-type 
{any,NN,superglue,superglue-fast,NN-superpoint,NN-ratio,NN-mutual,adalam}]
                       [--num-downscales INT] [--skip-colmap]
                       [--colmap-cmd STR] [--no-gpu] [--verbose]

arguments#

--data

Path the data, either a video file or a directory of images. (required)

--output-dir

Path to the output directory. (required)

--camera-type

Possible choices: perspective, fisheye

Camera model to use. (default: perspective)

--matching-method

Possible choices: exhaustive, sequential, vocab_tree

Feature matching method to use. Vocab tree is recommended for a balance of speed and accuracy. Exhaustive is slower but more accurate. Sequential is faster but should only be used for videos. (default: vocab_tree)

--sfm-tool

Possible choices: any, colmap, hloc

Structure from motion tool to use. Colmap will use sift features, hloc can use many modern methods such as superpoint features and superglue matcher (default: any)

--feature-type

Possible choices: any, sift, superpoint, superpoint_aachen, superpoint_max, superpoint_inloc, r2d2, d2net-ss, sosnet, disk

Type of feature to use. (default: any)

--matcher-type

Possible choices: any, NN, superglue, superglue-fast, NN-superpoint, NN-ratio, NN-mutual, adalam

Matching algorithm. (default: any)

--num-downscales

Number of times to downscale the images. Downscales by 2 each time. For example a value of 3 will downscale the images by 2x, 4x, and 8x. (default: 3)

--skip-colmap

If True, skips COLMAP and generates transforms.json if possible. (sets: skip_colmap=True)

--colmap-cmd

How to call the COLMAP executable. (default: colmap)

--no-gpu

If True, use GPU. (sets: gpu=False)

--verbose

If True, print extra logging. (sets: verbose=True)

video#

Process videos into a nerfstudio dataset. This script does the following:

  1. Converts the video into images.

  2. Scales images to a specified size.

  3. Calculates the camera poses for each image using COLMAP.

ns-process-data video [-h] --data PATH --output-dir PATH
                      [--num-frames-target INT]
                      [--camera-type {perspective,fisheye}]
                      [--matching-method {exhaustive,sequential,vocab_tree}]
                      [--sfm-tool {any,colmap,hloc}]
                      [--feature-type 
{any,sift,superpoint,superpoint_aachen,superpoint_max,superpoint_inloc,r2d2,d2
net-ss,sosnet,disk}]
                      [--matcher-type 
{any,NN,superglue,superglue-fast,NN-superpoint,NN-ratio,NN-mutual,adalam}]
                      [--num-downscales INT] [--skip-colmap]
                      [--colmap-cmd STR] [--percent-radius-crop FLOAT]
                      [--percent-crop FLOAT FLOAT FLOAT FLOAT] [--no-gpu]
                      [--verbose]

arguments#

--data

Path the data, either a video file or a directory of images. (required)

--output-dir

Path to the output directory. (required)

--num-frames-target

Target number of frames to use for the dataset, results may not be exact. (default: 300)

--camera-type

Possible choices: perspective, fisheye

Camera model to use. (default: perspective)

--matching-method

Possible choices: exhaustive, sequential, vocab_tree

Feature matching method to use. Vocab tree is recommended for a balance of speed and accuracy. Exhaustive is slower but more accurate. Sequential is faster but should only be used for videos. (default: vocab_tree)

--sfm-tool

Possible choices: any, colmap, hloc

Structure from motion tool to use. Colmap will use sift features, hloc can use many modern methods such as superpoint features and superglue matcher (default: any)

--feature-type

Possible choices: any, sift, superpoint, superpoint_aachen, superpoint_max, superpoint_inloc, r2d2, d2net-ss, sosnet, disk

Type of feature to use. (default: any)

--matcher-type

Possible choices: any, NN, superglue, superglue-fast, NN-superpoint, NN-ratio, NN-mutual, adalam

Matching algorithm. (default: any)

--num-downscales

Number of times to downscale the images. Downscales by 2 each time. For example a value of 3 will downscale the images by 2x, 4x, and 8x. (default: 3)

--skip-colmap

If True, skips COLMAP and generates transforms.json if possible. (sets: skip_colmap=True)

--colmap-cmd

How to call the COLMAP executable. (default: colmap)

--percent-radius-crop

Create circle crop mask. The radius is the percent of the image diagonal. (default: 1.0)

--percent-crop

Percent of the image to crop. (top, bottom, left, right) (default: 0.0 0.0 0.0 0.0)

--no-gpu

If True, use GPU. (sets: gpu=False)

--verbose

If True, print extra logging. (sets: verbose=True)

polycam#

Process Polycam data into a nerfstudio dataset. To capture data, use the Polycam app on an iPhone or iPad with LiDAR. The capture must be in LiDAR or ROOM mode. Developer mode must be enabled in the app settings, this will enable a raw data export option in the export menus. The exported data folder is used as the input to this script.

This script does the following:

  1. Scales images to a specified size.

  2. Converts Polycam poses into the nerfstudio format.

ns-process-data polycam [-h] --data PATH --output-dir PATH
                        [--num-downscales INT] [--use-uncorrected-images]
                        [--max-dataset-size INT] [--min-blur-score FLOAT]
                        [--crop-border-pixels INT] [--use-depth] [--verbose]

arguments#

--data

Path the polycam export data folder. Can be .zip file or folder. (required)

--output-dir

Path to the output directory. (required)

--num-downscales

Number of times to downscale the images. Downscales by 2 each time. For example a value of 3 will downscale the images by 2x, 4x, and 8x. (default: 3)

--use-uncorrected-images

If True, use the raw images from the polycam export. If False, use the corrected images. (sets: use_uncorrected_images=True)

--max-dataset-size

Max number of images to train on. If the dataset has more, images will be sampled approximately evenly. If -1, use all images. (default: 600)

--min-blur-score

Minimum blur score to use an image. If the blur score is below this value, the image will be skipped. (default: 25)

--crop-border-pixels

Number of pixels to crop from each border of the image. Useful as borders may be black due to undistortion. (default: 15)

--use-depth

If True, processes the generated depth maps from Polycam (sets: use_depth=True)

--verbose

If True, print extra logging. (sets: verbose=True)

metashape#

Process Metashape data into a nerfstudio dataset. This script assumes that cameras have been aligned using Metashape. After alignment, it is necessary to export the camera poses as a .xml file. This option can be found under File > Export > Export Cameras.

This script does the following:

  1. Scales images to a specified size.

  2. Converts Metashape poses into the nerfstudio format.

ns-process-data metashape [-h] --data PATH --xml PATH --output-dir PATH
                          [--num-downscales INT] [--max-dataset-size INT]
                          [--verbose]

arguments#

--data

Path to a folder of images. (required)

--xml

Path to the Metashape xml file. (required)

--output-dir

Path to the output directory. (required)

--num-downscales

Number of times to downscale the images. Downscales by 2 each time. For example a value of 3 will downscale the images by 2x, 4x, and 8x. (default: 3)

--max-dataset-size

Max number of images to train on. If the dataset has more, images will be sampled approximately evenly. If -1, use all images. (default: 600)

--verbose

If True, print extra logging. (sets: verbose=True)

realitycapture#

Process RealityCapture data into a nerfstudio dataset. This script assumes that cameras have been aligned using RealityCapture. After alignment, it is necessary to export the camera poses as a .csv file.

This script does the following:

  1. Scales images to a specified size.

  2. Converts RealityCapture poses into the nerfstudio format.

ns-process-data realitycapture [-h] --data PATH --csv PATH --output-dir PATH
                               [--num-downscales INT] [--max-dataset-size INT]
                               [--verbose]

arguments#

--data

Path to a folder of images. (required)

--csv

Path to the RealityCapture cameras CSV file. (required)

--output-dir

Path to the output directory. (required)

--num-downscales

Number of times to downscale the images. Downscales by 2 each time. For example a value of 3 will downscale the images by 2x, 4x, and 8x. (default: 3)

--max-dataset-size

Max number of images to train on. If the dataset has more, images will be sampled approximately evenly. If -1, use all images. (default: 600)

--verbose

If True, print extra logging. (sets: verbose=True)

insta360#

Process Insta360 videos into a nerfstudio dataset. Currently this uses a center crop of the raw data so data at the extreme edges of the video will be lost.

Expects data from a 2 camera Insta360, single or >2 camera models will not work. (tested with Insta360 One X2)

This script does the following:

  1. Converts the videos into images.

  2. Scales images to a specified size.

  3. Calculates the camera poses for each image using COLMAP.

ns-process-data insta360 [-h] --data PATH --output-dir PATH
                         [--num-frames-target INT]
                         [--matching-method 
{exhaustive,sequential,vocab_tree}]
                         [--num-downscales INT] [--skip-colmap]
                         [--colmap-cmd STR] [--no-gpu] [--verbose]

arguments#

--data

Path the data, It should be one of the 3 .insv files saved with each capture (Any work). (required)

--output-dir

Path to the output directory. (required)

--num-frames-target

Target number of frames to use for the dataset, results may not be exact. (default: 400)

--matching-method

Possible choices: exhaustive, sequential, vocab_tree

Feature matching method to use. Vocab tree is recommended for a balance of speed and accuracy. Exhaustive is slower but more accurate. Sequential is faster but should only be used for videos. (default: vocab_tree)

--num-downscales

Number of times to downscale the images. Downscales by 2 each time. For example a value of 3 will downscale the images by 2x, 4x, and 8x. (default: 3)

--skip-colmap

If True, skips COLMAP and generates transforms.json if possible. (sets: skip_colmap=True)

--colmap-cmd

How to call the COLMAP executable. (default: colmap)

--no-gpu

If True, use GPU. (sets: gpu=False)

--verbose

If True, print extra logging. (sets: verbose=True)

record3d#

Process Record3D data into a nerfstudio dataset. This script does the following:

  1. Scales images to a specified size.

  2. Converts Record3D poses into the nerfstudio format.

ns-process-data record3d [-h] --data PATH --output-dir PATH
                         [--num-downscales INT] [--max-dataset-size INT]
                         [--verbose]

arguments#

--data

Path to the record3D data. (required)

--output-dir

Path to the output directory. (required)

--num-downscales

Number of times to downscale the images. Downscales by 2 each time. For example a value of 3 will downscale the images by 2x, 4x, and 8x. (default: 3)

--max-dataset-size

Max number of images to train on. If the dataset has more, images will be sampled approximately evenly. If -1, use all images. (default: 300)

--verbose

If True, print extra logging. (sets: verbose=True)