Skip to content

Commit

Permalink
Fix circular import for MM dataprep notebook (#9287)
Browse files Browse the repository at this point in the history
* update launcher name and fix mm circular import

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
  • Loading branch information
2 people authored and web-flow committed May 23, 2024
1 parent 9d6e472 commit cebaebe
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 16 deletions.
4 changes: 3 additions & 1 deletion nemo/collections/nlp/parts/nlp_overrides.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@
# since PyTorch 2.3 the path has changed
from torch.amp.grad_scaler import _refresh_per_optimizer_state

from nemo.collections.multimodal.modules.stable_diffusion.attention import BasicTransformerBlock
from nemo.collections.nlp.modules.common.megatron.module import Float16Module
from nemo.collections.nlp.modules.common.megatron.transformer import AutocastTransformerLayer, ParallelTransformerLayer
from nemo.collections.nlp.parts import utils_funcs
Expand Down Expand Up @@ -662,6 +661,9 @@ def __init__(
# Use the default FSDP backward-prefetch policy for proper communication overlap.
kwargs['backward_prefetch'] = BackwardPrefetch.BACKWARD_PRE

# import here to prevent circular imports
from nemo.collections.multimodal.modules.stable_diffusion.attention import BasicTransformerBlock

# Set FSDP wrapping policy: use Transformer layer module as the FSDP sharding granularity.
self.fsdp_wrap_module = {
MCoreTransformerLayer,
Expand Down
21 changes: 6 additions & 15 deletions tutorials/multimodal/Multimodal Data Preparation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@
"\n",
"This notebook will show you how to prepare an image-text dataset into the [WebDataset](https://github.com/webdataset/webdataset) format. The Webdataset format is required to train all multimodal models in NeMo, such as Stable Diffusion and Imagen. \n",
"\n",
"This notebook is designed to demonstrate the different stages of multimodal dataset preparation. It is not meant to be used to process large-scale datasets since many stages are too time-consuming to run without parallelism. For large workloads, we recommend running the multimodal dataset preparation pipeline with the NeMo-Megatron-Launcher on multiple processors/GPUs. NeMo-Megatron-Launcher packs the same 5 scripts in this notebook into one runnable command and one config file to enable a smooth and a streamlined workflow.\n",
"This notebook is designed to demonstrate the different stages of multimodal dataset preparation. It is not meant to be used to process large-scale datasets since many stages are too time-consuming to run without parallelism. For large workloads, we recommend running the multimodal dataset preparation pipeline with the NeMo-Framework-Launcher on multiple processors/GPUs. NeMo-Framework-Launcher packs the same 5 scripts in this notebook into one runnable command and one config file to enable a smooth and a streamlined workflow.\n",
"\n",
"Depending on your use case, not all 5 stages need to be run. Please go to [NeMo Multimodal Documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/multimodal/text2img/datasets.html) for an overview of the 5 stages.\n",
" \n",
Expand All @@ -85,7 +85,7 @@
"source": [
"import os\n",
"\n",
"LAUNCHER_DIR = \"/opt/NeMo-Megatron-Launcher\"\n",
"LAUNCHER_DIR = \"/opt/NeMo-Framework-Launcher\" # formerly NeMo-Megatron-Launcher\n",
"SCRIPT_DIR = os.path.join(LAUNCHER_DIR, \"launcher_scripts/nemo_launcher/collections/dataprep_scripts/multimodal_dataprep\")\n",
"CONF_DIR = \"conf\"\n",
"DATA_DIR = \"dummy_data\"\n",
Expand Down Expand Up @@ -168,7 +168,7 @@
"\n",
"Script: download_images.py\n",
"\n",
"Environment variables (automatically set by SLURM if running with NeMo-Megatron-Launcher):\n",
"Environment variables (automatically set by SLURM if running with NeMo-Framework-Launcher):\n",
"- `SLURM_ARRAY_TASK_COUNT`: total number of tasks, should be set to the number of parquet files in `$DATA_DIR/parquet/dummy_dataset50000.parquet_parts`. (i.e. `parquet_subpartitions` x `num_parquets_downloaded`)\n",
"- `SLURM_ARRAY_TASK_ID`: id of the current task (0 <= SLURM_ARRAY_TASK_ID < SLURM_ARRAY_TASK_COUNT)\n",
"\n",
Expand Down Expand Up @@ -266,7 +266,7 @@
"\n",
"Script: reorganize_tar.py\n",
"\n",
"Environment variables (automatically set by SLURM if running with NeMo-Megatron-Launcher):\n",
"Environment variables (automatically set by SLURM if running with NeMo-Framework-Launcher):\n",
"- `SLURM_ARRAY_TASK_COUNT`: total number of tasks, should be set to parquet_subpartitions x num_parquets_downloaded\n",
"- `SLURM_ARRAY_TASK_ID`: id of the current task (0 <= `SLURM_ARRAY_TASK_ID` < `SLURM_ARRAY_TASK_COUNT`)\n",
"\n",
Expand Down Expand Up @@ -430,7 +430,7 @@
},
"outputs": [],
"source": [
"! wget https://raw.githubusercontent.com/NVIDIA/NeMo-Megatron-Launcher/master/launcher_scripts/conf/data_preparation/multimodal/precache_sd.yaml -P $CONF_DIR/"
"! wget https://raw.githubusercontent.com/NVIDIA/NeMo-Framework-Launcher/master/launcher_scripts/conf/data_preparation/multimodal/precache_sd.yaml -P $CONF_DIR/"
]
},
{
Expand Down Expand Up @@ -506,7 +506,7 @@
"\n",
"Script: precache_encodings.py\n",
"\n",
"Environment variables (automatically set by SLURM if running with NeMo-Megatron-Launcher):\n",
"Environment variables (automatically set by SLURM if running with NeMo-Framework-Launcher):\n",
"- `SLURM_ARRAY_TASK_COUNT`: total number of tasks, should be set to parquet_subpartitions x num_parquets_downloaded\n",
"- `SLURM_ARRAY_TASK_ID`: id of the current task (0 <= `SLURM_ARRAY_TASK_ID` < `SLURM_ARRAY_TASK_COUNT`)\n",
"\n",
Expand All @@ -533,15 +533,6 @@
" precache_config_path=$CONF_DIR/precache_sd_example.yaml"
]
},
{
"cell_type": "markdown",
"source": [
"If you encounter a nemo import problem with the cell above, please also running it in the terminal directly."
],
"metadata": {
"collapsed": false
}
},
{
"attachments": {},
"cell_type": "markdown",
Expand Down

0 comments on commit cebaebe

Please sign in to comment.