Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix circular import for MM dataprep notebook #9292

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion nemo/collections/nlp/parts/nlp_overrides.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@
# since PyTorch 2.3 the path has changed
from torch.amp.grad_scaler import _refresh_per_optimizer_state

from nemo.collections.multimodal.modules.stable_diffusion.attention import BasicTransformerBlock
from nemo.collections.nlp.modules.common.megatron.module import Float16Module
from nemo.collections.nlp.modules.common.megatron.transformer import AutocastTransformerLayer, ParallelTransformerLayer
from nemo.collections.nlp.parts import utils_funcs
Expand Down Expand Up @@ -662,6 +661,9 @@ def __init__(
# Use the default FSDP backward-prefetch policy for proper communication overlap.
kwargs['backward_prefetch'] = BackwardPrefetch.BACKWARD_PRE

# import here to prevent circular imports
from nemo.collections.multimodal.modules.stable_diffusion.attention import BasicTransformerBlock

# Set FSDP wrapping policy: use Transformer layer module as the FSDP sharding granularity.
self.fsdp_wrap_module = {
MCoreTransformerLayer,
Expand Down
21 changes: 6 additions & 15 deletions tutorials/multimodal/Multimodal Data Preparation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
"\n",
"This notebook will show you how to prepare an image-text dataset into the [WebDataset](https://github.com/webdataset/webdataset) format. The Webdataset format is required to train all multimodal models in NeMo, such as Stable Diffusion and Imagen. \n",
"\n",
"This notebook is designed to demonstrate the different stages of multimodal dataset preparation. It is not meant to be used to process large-scale datasets since many stages are too time-consuming to run without parallelism. For large workloads, we recommend running the multimodal dataset preparation pipeline with the NeMo-Megatron-Launcher on multiple processors/GPUs. NeMo-Megatron-Launcher packs the same 5 scripts in this notebook into one runnable command and one config file to enable a smooth and a streamlined workflow.\n",
"This notebook is designed to demonstrate the different stages of multimodal dataset preparation. It is not meant to be used to process large-scale datasets since many stages are too time-consuming to run without parallelism. For large workloads, we recommend running the multimodal dataset preparation pipeline with the NeMo-Framework-Launcher on multiple processors/GPUs. NeMo-Framework-Launcher packs the same 5 scripts in this notebook into one runnable command and one config file to enable a smooth and a streamlined workflow.\n",
"\n",
"Depending on your use case, not all 5 stages need to be run. Please go to [NeMo Multimodal Documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/multimodal/text2img/datasets.html) for an overview of the 5 stages.\n",
" \n",
Expand All @@ -85,7 +85,7 @@
"source": [
"import os\n",
"\n",
"LAUNCHER_DIR = \"/opt/NeMo-Megatron-Launcher\"\n",
"LAUNCHER_DIR = \"/opt/NeMo-Framework-Launcher\" # formerly NeMo-Megatron-Launcher\n",
"SCRIPT_DIR = os.path.join(LAUNCHER_DIR, \"launcher_scripts/nemo_launcher/collections/dataprep_scripts/multimodal_dataprep\")\n",
"CONF_DIR = \"conf\"\n",
"DATA_DIR = \"dummy_data\"\n",
Expand Down Expand Up @@ -168,7 +168,7 @@
"\n",
"Script: download_images.py\n",
"\n",
"Environment variables (automatically set by SLURM if running with NeMo-Megatron-Launcher):\n",
"Environment variables (automatically set by SLURM if running with NeMo-Framework-Launcher):\n",
"- `SLURM_ARRAY_TASK_COUNT`: total number of tasks, should be set to the number of parquet files in `$DATA_DIR/parquet/dummy_dataset50000.parquet_parts`. (i.e. `parquet_subpartitions` x `num_parquets_downloaded`)\n",
"- `SLURM_ARRAY_TASK_ID`: id of the current task (0 <= SLURM_ARRAY_TASK_ID < SLURM_ARRAY_TASK_COUNT)\n",
"\n",
Expand Down Expand Up @@ -266,7 +266,7 @@
"\n",
"Script: reorganize_tar.py\n",
"\n",
"Environment variables (automatically set by SLURM if running with NeMo-Megatron-Launcher):\n",
"Environment variables (automatically set by SLURM if running with NeMo-Framework-Launcher):\n",
"- `SLURM_ARRAY_TASK_COUNT`: total number of tasks, should be set to parquet_subpartitions x num_parquets_downloaded\n",
"- `SLURM_ARRAY_TASK_ID`: id of the current task (0 <= `SLURM_ARRAY_TASK_ID` < `SLURM_ARRAY_TASK_COUNT`)\n",
"\n",
Expand Down Expand Up @@ -430,7 +430,7 @@
},
"outputs": [],
"source": [
"! wget https://raw.githubusercontent.com/NVIDIA/NeMo-Megatron-Launcher/master/launcher_scripts/conf/data_preparation/multimodal/precache_sd.yaml -P $CONF_DIR/"
"! wget https://raw.githubusercontent.com/NVIDIA/NeMo-Framework-Launcher/master/launcher_scripts/conf/data_preparation/multimodal/precache_sd.yaml -P $CONF_DIR/"
]
},
{
Expand Down Expand Up @@ -506,7 +506,7 @@
"\n",
"Script: precache_encodings.py\n",
"\n",
"Environment variables (automatically set by SLURM if running with NeMo-Megatron-Launcher):\n",
"Environment variables (automatically set by SLURM if running with NeMo-Framework-Launcher):\n",
"- `SLURM_ARRAY_TASK_COUNT`: total number of tasks, should be set to parquet_subpartitions x num_parquets_downloaded\n",
"- `SLURM_ARRAY_TASK_ID`: id of the current task (0 <= `SLURM_ARRAY_TASK_ID` < `SLURM_ARRAY_TASK_COUNT`)\n",
"\n",
Expand All @@ -533,15 +533,6 @@
" precache_config_path=$CONF_DIR/precache_sd_example.yaml"
]
},
{
"cell_type": "markdown",
"source": [
"If you encounter a nemo import problem with the cell above, please also running it in the terminal directly."
],
"metadata": {
"collapsed": false
}
},
{
"attachments": {},
"cell_type": "markdown",
Expand Down
Loading