[QST] Data loader with EmbeddingOperator using pretrained embeddings is very slow #1244

CarloNicolini · 2024-07-04T09:32:02Z

❓ Questions & Help

I am experiencing a large degradation in the performance of the Loader when adding a transform with EmbeddingOperator, for loading data from a pretrained embeddings numpy array.
I have been following the method exposed in this tutorial notebook

Without the transforms argument the entire dataset is consumed in 6 seconds, while with the loading from pretrained embeddings array it takes almost 40 minutes!
My "validation.parquet" is a small NVTabular dataset with 16 partitions, totalling almost 200 MB.
Specifically with the transforms enabled, I am seeing a very low CPU and GPU utilization, as well as close to zero GPU memory consumption. Nor the CPU or the GPU gets utilized more than 6%.
It seems very strange to me that simply reading batch_size specific rows from a numpy array takes that much time, even considering moving them to GPU.

Details

Here is a minimal working example to reproduce this degradation.

from __future__ import annotations

from pathlib import Path

import numpy as np
from merlin.dataloader.ops.embeddings import EmbeddingOperator
from merlin.io.dataset import Dataset
from merlin.loader.tensorflow import Loader
from tqdm.auto import tqdm


def test_pretrained_loader():
    data_path = "validation.parquet"
    data_path = Path(data_path)
    X = Dataset(data_path, engine="parquet")
    pretrained_array = np.zeros((1_000_000, 2), dtype=np.float32)

    loader = Loader(
        X,
        batch_size=4096,
        shuffle=True,
        transforms=[
            EmbeddingOperator(
                pretrained_array,
                lookup_key="recruitment_id",
                embedding_name="embeddings",
            )
        ],
        device="gpu",
    )

    for batch in tqdm(loader, desc="Iterating batches..."):
        pass


if __name__ == "__main__":
    test_pretrained_loader()

Question

Is this behaviour intended? What are possible bottlenecks for this? Is something like data prefetching or asynchronous loading applicable here?

Tasks

Give feedback

No tasks being tracked yet.

Options

The text was updated successfully, but these errors were encountered:

CarloNicolini added the status/needs-triage label Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Data loader with EmbeddingOperator using pretrained embeddings is very slow #1244

[QST] Data loader with EmbeddingOperator using pretrained embeddings is very slow #1244

CarloNicolini commented Jul 4, 2024 •

edited

Loading

Tasks

[QST] Data loader with EmbeddingOperator using pretrained embeddings is very slow #1244

[QST] Data loader with EmbeddingOperator using pretrained embeddings is very slow #1244

Comments

CarloNicolini commented Jul 4, 2024 • edited Loading

❓ Questions & Help

Details

Question

Tasks

CarloNicolini commented Jul 4, 2024 •

edited

Loading