nerfstudio-project · AntonioMacaronio · Jun 11, 2024 · Jun 12, 2024 · Jun 12, 2024 · Jun 12, 2024
diff --git a/docs/developer_guides/pipelines/datamanagers.md b/docs/developer_guides/pipelines/datamanagers.md
@@ -14,19 +14,28 @@
 
 ## What is a DataManager?
 
-The DataManager returns RayBundle and RayGT objects. Let's first take a look at the most important abstract methods required by the DataManager.
+The DataManager batches and returns two components from an input dataset:
+
+1. A representation of viewpoint (either cameras or rays).
+    - For splatting methods (`FullImageDataManager`): a `Cameras` object.
+    - For ray sampling methods (`VanillaDataManager`): a `RayBundle` object.
+2. A dictionary of ground truth data.
+    - For splatting methods (`FullImageDataManager`): dictionary contains complete images.
+    - For ray sampling methods (`VanillaDataManager`): dictionary contains per-ray information.
+
+Behaviors are defined by implementing the abstract methods required by the DataManager:
 
 ```python
 class DataManager(nn.Module):
     """Generic data manager's abstract class
     """
 
     @abstractmethod
-    def next_train(self, step: int) -> Tuple[RayBundle, Dict]:
+    def next_train(self, step: int) -> Tuple[Union[RayBundle, Cameras], Dict]:
         """Returns the next batch of data for train."""
 
     @abstractmethod
-    def next_eval(self, step: int) -> Tuple[RayBundle, Dict]:
+    def next_eval(self, step: int) -> Tuple[Union[RayBundle, Cameras], Dict]:
         """Returns the next batch of data for eval."""
 
     @abstractmethod
@@ -94,3 +103,98 @@ See the code!
 ## Creating Your Own
 
 We currently don't have other implementations because most papers follow the VanillaDataManager implementation. However, it should be straightforward to add a VanillaDataManager with logic that progressively adds cameras, for instance, by relying on the step and modifying RayBundle and RayGT generation logic.
+
+## Disk Caching for Large Datasets
+As of January 2025, the FullImageDatamanager and ParallelImageDatamanager implementations now support parallelized dataloading and dataloading from disk to avoid Out-Of-Memory errors and support very large datasets. To train a NeRF-based method with a large dataset that's unable to fit in memory, please add the `load_from_disk` flag to your `ns-train` command. For example with nerfacto:
+```bash
+ns-train nerfacto --data {PROCESSED_DATA_DIR} --pipeline.datamanager.load-from-disk
+```
+
+To train splatfacto with a large dataset that's unable to fit in memory, please set the device of `cache_images` to `"disk"`. For example with splatfacto:
+```bash
+ns-train splatfacto --data {PROCESSED_DATA_DIR} --pipeline.datamanager.cache-images disk
+```
+
+## Migrating Your DataManager to the new DataManager 
+Many methods subclass a DataManager and add extra data to it. If you would like your custom datamanager to also support new parallel features, you can migrate any custom dataloading logic to the new `custom_ray_processor()` API. This function takes in a full training batch (either image or ray bundle) and allows the user to modify or add to it. Let's take a look at an example for the LERF method, which was built on Nerfstudio's VanillaDataManager. This API provides an interface to attach new information to the RayBundle (for ray based methods), Cameras object (for splatting based methods), or ground truth dictionary. It runs in a background process if disk caching is enabled, otherwise it runs in the main process.
+
+Naively transfering code to `custom_ray_processor` may still OOM on very large datasets if initialization code requires computing something over the whole dataset. To fully take advantage of parallelization make sure your subclassed datamanager computes new information inside the `custom_ray_processor`, or caches a subset of the whole dataset. This can also still be slow if pre-computation requires GPU-heavy steps on the same GPU used for training.
+
+**Note**: Because the parallel DataManager uses background processes, any member of the DataManager needs to be *picklable* to be used inside `custom_ray_processor`.
+
+```python
+class LERFDataManager(VanillaDataManager):
+    """Subclass VanillaDataManager to add extra data processing
+
+    Args:
+        config: the DataManagerConfig used to instantiate class
+    """
+
+    config: LERFDataManagerConfig
+
+    def __init__(
+        self,
+        config: LERFDataManagerConfig,
+        device: Union[torch.device, str] = "cpu",
+        test_mode: Literal["test", "val", "inference"] = "val",
+        world_size: int = 1,
+        local_rank: int = 0,
+        **kwargs,
+    ):
+        super().__init__(
+            config=config, device=device, test_mode=test_mode, world_size=world_size, local_rank=local_rank, **kwargs
+        )
+        # Some code to initialize all the CLIP and DINO feature encoders.
+        self.image_encoder: BaseImageEncoder = kwargs["image_encoder"]
+        self.dino_dataloader = ...
+        self.clip_interpolator = ...
+
+    def next_train(self, step: int) -> Tuple[RayBundle, Dict]:
+        """Returns the next batch of data from the train dataloader.
+
+        In this custom DataManager we need to add on the data that LERF needs, namely CLIP and DINO features.
+        """
+        self.train_count += 1
+        image_batch = next(self.iter_train_image_dataloader)
+        assert self.train_pixel_sampler is not None
+        batch = self.train_pixel_sampler.sample(image_batch)
+        ray_indices = batch["indices"]
+        ray_bundle = self.train_ray_generator(ray_indices)
+        batch["clip"], clip_scale = self.clip_interpolator(ray_indices)
+        batch["dino"] = self.dino_dataloader(ray_indices)
+        ray_bundle.metadata["clip_scales"] = clip_scale
+        # assume all cameras have the same focal length and image width
+        ray_bundle.metadata["fx"] = self.train_dataset.cameras[0].fx.item()
+        ray_bundle.metadata["width"] = self.train_dataset.cameras[0].width.item()
+        ray_bundle.metadata["fy"] = self.train_dataset.cameras[0].fy.item()
+        ray_bundle.metadata["height"] = self.train_dataset.cameras[0].height.item()
+        return ray_bundle, batch
+```
+
+To migrate this custom datamanager to the new datamanager, we'll subclass the new ParallelDataManager and shift the data customization process from `next_train()` to `custom_ray_processor()`. 
+The function `custom_ray_processor()` is called with a fully populated ray bundle and ground truth batch, just like the subclassed `next_train` in the above code. This code, however, is run in a background process.
+
+```python
+class LERFDataManager(ParallelDataManager, Generic[TDataset]):
+    """
+    __init__ stays the same
+    """
+
+    ...
+
+    def custom_ray_processor(
+            self, ray_bundle: RayBundle, batch: Dict
+        ) -> Tuple[RayBundle, Dict]:
+            """An API to add latents, metadata, or other further customization to the RayBundle dataloading process that is parallelized."""
+            ray_indices = batch["indices"]
+            batch["clip"], clip_scale = self.clip_interpolator(ray_indices)
+            batch["dino"] = self.dino_dataloader(ray_indices)
+            ray_bundle.metadata["clip_scales"] = clip_scale
+
+            # Assume all cameras have the same focal length and image dimensions.
+            ray_bundle.metadata["fx"] = self.train_dataset.cameras[0].fx.item()
+            ray_bundle.metadata["width"] = self.train_dataset.cameras[0].width.item()
+            ray_bundle.metadata["fy"] = self.train_dataset.cameras[0].fy.item()
+            ray_bundle.metadata["height"] = self.train_dataset.cameras[0].height.item()
+            return ray_bundle, batch
+```
diff --git a/nerfstudio/configs/method_configs.py b/nerfstudio/configs/method_configs.py
@@ -28,7 +28,7 @@
 from nerfstudio.configs.external_methods import ExternalMethodDummyTrainerConfig, get_external_methods
 from nerfstudio.data.datamanagers.base_datamanager import VanillaDataManager, VanillaDataManagerConfig
 from nerfstudio.data.datamanagers.full_images_datamanager import FullImageDatamanagerConfig
-from nerfstudio.data.datamanagers.parallel_datamanager import ParallelDataManagerConfig
+from nerfstudio.data.datamanagers.parallel_datamanager import ParallelDataManager
 from nerfstudio.data.datamanagers.random_cameras_datamanager import RandomCamerasDataManagerConfig
 from nerfstudio.data.dataparsers.blender_dataparser import BlenderDataParserConfig
 from nerfstudio.data.dataparsers.dnerf_dataparser import DNeRFDataParserConfig
@@ -37,6 +37,7 @@
 from nerfstudio.data.dataparsers.phototourism_dataparser import PhototourismDataParserConfig
 from nerfstudio.data.dataparsers.sdfstudio_dataparser import SDFStudioDataParserConfig
 from nerfstudio.data.dataparsers.sitcoms3d_dataparser import Sitcoms3DDataParserConfig
+from nerfstudio.data.datasets.base_dataset import InputDataset
 from nerfstudio.data.datasets.depth_dataset import DepthDataset
 from nerfstudio.data.datasets.sdf_dataset import SDFDataset
 from nerfstudio.data.datasets.semantic_dataset import SemanticDataset
@@ -91,7 +92,8 @@
     max_num_iterations=30000,
     mixed_precision=True,
     pipeline=VanillaPipelineConfig(
-        datamanager=ParallelDataManagerConfig(
+        datamanager=VanillaDataManagerConfig(
+            _target=ParallelDataManager[InputDataset],
             dataparser=NerfstudioDataParserConfig(),
             train_num_rays_per_batch=4096,
             eval_num_rays_per_batch=4096,
@@ -127,7 +129,8 @@
     max_num_iterations=100000,
     mixed_precision=True,
     pipeline=VanillaPipelineConfig(
-        datamanager=ParallelDataManagerConfig(
+        datamanager=VanillaDataManagerConfig(
+            _target=ParallelDataManager[InputDataset],
             dataparser=NerfstudioDataParserConfig(),
             train_num_rays_per_batch=8192,
             eval_num_rays_per_batch=4096,
@@ -171,7 +174,8 @@
     max_num_iterations=100000,
     mixed_precision=True,
     pipeline=VanillaPipelineConfig(
-        datamanager=ParallelDataManagerConfig(
+        datamanager=VanillaDataManagerConfig(
+            _target=ParallelDataManager[InputDataset],
             dataparser=NerfstudioDataParserConfig(),
             train_num_rays_per_batch=16384,
             eval_num_rays_per_batch=4096,
@@ -220,7 +224,7 @@
     mixed_precision=True,
     pipeline=VanillaPipelineConfig(
         datamanager=VanillaDataManagerConfig(
-            _target=VanillaDataManager[DepthDataset],
+            _target=ParallelDataManager[DepthDataset],
             dataparser=NerfstudioDataParserConfig(),
             train_num_rays_per_batch=4096,
             eval_num_rays_per_batch=4096,
@@ -302,7 +306,7 @@
 method_configs["mipnerf"] = TrainerConfig(
     method_name="mipnerf",
     pipeline=VanillaPipelineConfig(
-        datamanager=ParallelDataManagerConfig(dataparser=NerfstudioDataParserConfig(), train_num_rays_per_batch=1024),
+        datamanager=VanillaDataManagerConfig(dataparser=NerfstudioDataParserConfig(), train_num_rays_per_batch=1024),
         model=VanillaModelConfig(
             _target=MipNerfModel,
             loss_coefficients={"rgb_loss_coarse": 0.1, "rgb_loss_fine": 1.0},
@@ -375,7 +379,7 @@
     max_num_iterations=30000,
     mixed_precision=False,
     pipeline=VanillaPipelineConfig(
-        datamanager=ParallelDataManagerConfig(
+        datamanager=VanillaDataManagerConfig(
             dataparser=BlenderDataParserConfig(),
             train_num_rays_per_batch=4096,
             eval_num_rays_per_batch=4096,

diff --git a/nerfstudio/data/datamanagers/base_datamanager.py b/nerfstudio/data/datamanagers/base_datamanager.py
@@ -19,7 +19,6 @@
 from __future__ import annotations
 
 from abc import abstractmethod
-from collections import defaultdict
 from dataclasses import dataclass, field
 from functools import cached_property
 from pathlib import Path
@@ -42,7 +41,6 @@
 
 import torch
 import tyro
-from torch import nn
 from torch.nn import Parameter
 from torch.utils.data.distributed import DistributedSampler
 from typing_extensions import TypeVar
@@ -56,44 +54,19 @@
 from nerfstudio.data.dataparsers.blender_dataparser import BlenderDataParserConfig
 from nerfstudio.data.datasets.base_dataset import InputDataset
 from nerfstudio.data.pixel_samplers import PatchPixelSamplerConfig, PixelSampler, PixelSamplerConfig
-from nerfstudio.data.utils.dataloaders import CacheDataloader, FixedIndicesEvalDataloader, RandIndicesEvalDataloader
+from nerfstudio.data.utils.dataloaders import (
+    CacheDataloader,
+    FixedIndicesEvalDataloader,
+    RandIndicesEvalDataloader,
+    variable_res_collate,
+)
 from nerfstudio.data.utils.nerfstudio_collate import nerfstudio_collate
 from nerfstudio.engine.callbacks import TrainingCallback, TrainingCallbackAttributes
 from nerfstudio.model_components.ray_generators import RayGenerator
 from nerfstudio.utils.misc import IterableWrapper, get_orig_class
 from nerfstudio.utils.rich_utils import CONSOLE
 
 
-def variable_res_collate(batch: List[Dict]) -> Dict:
-    """Default collate function for the cached dataloader.
-    Args:
-        batch: Batch of samples from the dataset.
-    Returns:
-        Collated batch.
-    """
-    images = []
-    imgdata_lists = defaultdict(list)
-    for data in batch:
-        image = data.pop("image")
-        images.append(image)
-        topop = []
-        for key, val in data.items():
-            if isinstance(val, torch.Tensor):
-                # if the value has same height and width as the image, assume that it should be collated accordingly.
-                if len(val.shape) >= 2 and val.shape[:2] == image.shape[:2]:
-                    imgdata_lists[key].append(val)
-                    topop.append(key)
-        # now that iteration is complete, the image data items can be removed from the batch
-        for key in topop:
-            del data[key]
-
-    new_batch = nerfstudio_collate(batch)
-    new_batch["image"] = images
-    new_batch.update(imgdata_lists)
-
-    return new_batch
-
-
 @dataclass
 class DataManagerConfig(InstantiateConfig):
     """Configuration for data manager instantiation; DataManager is in charge of keeping the train/eval dataparsers;
@@ -111,7 +84,7 @@ class DataManagerConfig(InstantiateConfig):
     """Process images on GPU for speed at the expense of memory, if True."""
 
 
-class DataManager(nn.Module):
+class DataManager:
     """Generic data manager's abstract class
 
     This version of the data manager is designed be a monolithic way to load data and latents,
@@ -164,16 +137,16 @@ class DataManager(nn.Module):
     train_sampler: Optional[DistributedSampler] = None
     eval_sampler: Optional[DistributedSampler] = None
     includes_time: bool = False
+    test_mode: Literal["test", "val", "inference"] = "val"
 
     def __init__(self):
         """Constructor for the DataManager class.
 
         Subclassed DataManagers will likely need to override this constructor.
 
-        If you aren't manually calling the setup_train and setup_eval functions from an overriden
-        constructor, that you call super().__init__() BEFORE you initialize any
-        nn.Modules or nn.Parameters, but AFTER you've already set all the attributes you need
-        for the setup functions."""
+        If you aren't manually calling the setup_train() and setup_eval() functions from an overridden
+        constructor, please call super().__init__() in your subclass' __init__() method after
+        you've already set all the attributes you need for the setup functions."""
         super().__init__()
         self.train_count = 0
         self.eval_count = 0
@@ -311,13 +284,17 @@ class VanillaDataManagerConfig(DataManagerConfig):
     """Target class to instantiate."""
     dataparser: AnnotatedDataParserUnion = field(default_factory=BlenderDataParserConfig)
     """Specifies the dataparser used to unpack the data."""
+    cache_images_type: Literal["uint8", "float32"] = "float32"
+    """The image type returned from manager, caching images in uint8 saves memory"""
     train_num_rays_per_batch: int = 1024
     """Number of rays per batch to use per training iteration."""
-    train_num_images_to_sample_from: int = -1
+    train_num_images_to_sample_from: int = 50
     """Number of images to sample during training iteration."""
-    train_num_times_to_repeat_images: int = -1
+    train_num_times_to_repeat_images: int = 10
     """When not training on all images, number of iterations before picking new
-    images. If -1, never pick new images."""
+    images. If -1, never pick new images.
+    Note: decreasing train_num_images_to_sample_from and increasing train_num_times_to_repeat_images alleviates CPU bottleneck.
+    """
     eval_num_rays_per_batch: int = 1024
     """Number of rays per batch to use per eval iteration."""
     eval_num_images_to_sample_from: int = -1
@@ -331,10 +308,20 @@ class VanillaDataManagerConfig(DataManagerConfig):
     """Specifies the collate function to use for the train and eval dataloaders."""
     camera_res_scale_factor: float = 1.0
     """The scale factor for scaling spatial data such as images, mask, semantics
-    along with relevant information about camera intrinsics
-    """
+    along with relevant information about camera intrinsics"""
     patch_size: int = 1
     """Size of patch to sample from. If > 1, patch-based sampling will be used."""
+    load_from_disk: bool = False
+    """If True, conserves RAM memory by loading images from disk.
+    If False, caches all the images as tensors to RAM and loads from RAM."""
+    dataloader_num_workers: int = 4
+    """The number of workers performing the dataloading from either disk/RAM, which 
+    includes collating, pixel sampling, unprojecting, ray generation etc."""
+    prefetch_factor: int = 10
+    """The limit number of batches a worker will start loading once an iterator is created. 
+    More details are described here: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader"""
+    cache_compressed_images: bool = False
+    """If True, cache raw image files as byte strings to RAM."""
 
     # tyro.conf.Suppress prevents us from creating CLI arguments for this field.
     camera_optimizer: tyro.conf.Suppress[Optional[CameraOptimizerConfig]] = field(default=None)
@@ -451,13 +438,15 @@ def create_train_dataset(self) -> TDataset:
         return self.dataset_type(
             dataparser_outputs=self.train_dataparser_outputs,
             scale_factor=self.config.camera_res_scale_factor,
+            cache_compressed_images=self.config.cache_compressed_images,
         )
 
     def create_eval_dataset(self) -> TDataset:
         """Sets up the data loaders for evaluation"""
         return self.dataset_type(
             dataparser_outputs=self.dataparser.get_dataparser_outputs(split=self.test_split),
             scale_factor=self.config.camera_res_scale_factor,
+            cache_compressed_images=self.config.cache_compressed_images,
         )
 
     def _get_pixel_sampler(self, dataset: TDataset, num_rays_per_batch: int) -> PixelSampler: