How to change pytorch DataLoader/Dataset for nemo? #7769
-
Hello. I want to create custom DataLoader or Dataset that will change text of data into different tokens for Marblenet model on the fly (i.e. while training). For that purpose I researched how DataLoader and Dataset work both in pytorch and pytorch lightning. But maybe because it still hasn't settled in, I don't understand how to do everything using nemo library models, considering that for nemo models to work we use config files with to create dataloaders automatically and I can't understand what we can do with them to work correctly with nemo models. So, for my point to be clear perhaps I should be a little verbose. The only threads I found about this discussion are following: this and this. But both of them do little to make me understand what should be done. I researched the following: As pytorch lightning says in succinct tutorial that it uses natives classes of pytorch for Dataloaders. For vanilla pytorch there is another article how to create custom Dataloaders and Datasets. For pytorch lightning I also found class But here comes the problem. Let's look at my code draft for reference:
In the code
So, the model is created using config in the style nemo requires, and through code I give paths to datasets that are used in training/testing/validation. We can grab dataloaders using nemo's I mean, I could probably change dataloader creating So it's seems as wrong decision to override I tried looking in pytorch lightning The only option left was to pass dataloaders as arguments to (look full snippet higher)
What should I do to make it all work? I hope you can help me settle my confusion. If something is unclear, ask me any question and I will try to my best abilities to answer. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Well, for anyone looking at this in the future, I have two advices how to solve the problem. First one is depicted in one NeMo tutorial (https://github.com/NVIDIA/NeMo/blob/main/tutorials/01_NeMo_Models.ipynb), where they create custom Dataset and model class. Second one is to create custom model class by redefining it's parent classes like we have done with colleague. It means that you should find the function that processes what you need and redifine it to proccess it your way. In my case it was class
Than we must inherit this changed class in all other function all the way till our custom class. I hope what I said makes sense and helps some poor soul. |
Beta Was this translation helpful? Give feedback.
Well, for anyone looking at this in the future, I have two advices how to solve the problem.
First one is depicted in one NeMo tutorial (https://github.com/NVIDIA/NeMo/blob/main/tutorials/01_NeMo_Models.ipynb), where they create custom Dataset and model class.
Second one is to create custom model class by redefining it's parent classes like we have done with colleague. It means that you should find the function that processes what you need and redifine it to proccess it your way. In my case it was class
ASRAudioText
(I changed how it processes text viamy_text = re.sub('[a-zA-Zа-яА-Я]', '-', item['text'])
for different tokens).