Questions about algorithm #5

pzn666 · 2021-05-08T09:18:00Z

Thanks for your work, I have some questions about MER
In algorithm 1,

Do we always need to train with all different datasets at once?
If we only want to train one dataset (say Dt) with a model trained with other datasets (say D1~Dt-1), how do we initialize buffer M? (is buffer M necessary for this scenario?)

Thanks

mattriemer · 2021-05-24T03:49:09Z

Thank you for your interest in our work! I am very sorry for the slow response, it is a particularly busy time of year for me. I am not sure that I fully understood your questions, so definitely let me know if any aspects of my explanation do not make sense. The experiments of our paper focused on a particular setting of continual learning where the model is trained continuously one example at a time on a series of tasks one after the other. However, the approach is in principle much more general than this setting.

One way to interpret your first question would be that you are concerned about the manner in which the model will be deployed and whether it is possible to essentially pause the training of the model to resume later. This is definitely possible, all you need to do is save the state of the parameters of the model as well as the state of the buffer (including its age for reservoir sampling). Another interpretation is that you are asking if our strategy relies on training on actual data from past tasks in order to retain performance on those tasks or rather if it works without utilizing this across task data. We are definitely a replay based approach, so we try to do well with a small buffer of past examples from old tasks, but we don’t address a setting where these items are totally assumed unavailable.

For your second question, I guess my interpretation is that you are saying that you are interested in a setting that is not the full blown online continual learning setting of our paper where maybe you have as a starting point a pre-trained multi-task model and your main goal is transferring that model to perform well on a single downstream task. Is the buffer necessary in this case? This is kind of a tricky question, here is my two cents: if you only care about performance on the downstream task and not about retention of previous tasks that does take a way a big part of the theoretical motivation for applying replay. In my experience, if we only care about forward transfer and not retention and we are only training on a single downstream task online, replay goes from being a necessity to get competitive performance to more of a luxury that is always likely to add benefit. However, the gains may not be huge and it needs some extra parameter tuning in comparison to the vanilla fine-tuning based transfer learning approach. The reason why replay may still yield big gains in this setting in some cases is because sometimes not forgetting knowledge of past tasks in a good regularizer to guide the solution with a helpful inductive bias that its representation should perform well on all tasks. This is particularly likely if the downstream task has a very small amount of data. If you would like to apply an approach like MER in this kind of setting, you should initialize the buffer with ideally the full dataset of past tasks for the best performance assuming memory constraints are not a practical issue for you. If you only have enough memory for a finite buffer size that is smaller than the size of the past datasets, what would be most consistent with the approach of our paper would be to take a random subsampling of the buffer size that is allowable.

I hope this helps address your questions and definitely let me know if I can clarify further!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about algorithm #5

Questions about algorithm #5

pzn666 commented May 8, 2021

mattriemer commented May 24, 2021

Questions about algorithm #5

Questions about algorithm #5

Comments

pzn666 commented May 8, 2021

mattriemer commented May 24, 2021