-
Notifications
You must be signed in to change notification settings - Fork 677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calling learner.feature_importance on larger than memory dataset causes OOM #310
Comments
Hi @scottcha,
I think option 3 would probably cover most scenarios as it's the most flexible. |
@oguiza I agree 3 is the most flexible. My entire chrome session running jupyter crashes with this error:
Each of my samples is about .5mb to 1mb on disk and I encounter this error even when only computing with 100 samples--since I have ~900 features it goes through the calculation that many times but seems to hit this around iteration 50. Even though monitoring my system ram (which does grow aggressively during this at approximately 1gb per iteration in the feature importance calculation and my gpu ram seems constant but obviously something seems to be leaking or growing out of control). My guess its related to some of the gpu allocated objects not getting freed but I wasn't sure how to debug that. Also, FWIW I ran this outside of jupyter in VS Code python debugger and get the same error with the one additional piece of information that it indicates "Dataloader Worker (PID(s) 1618) Exited Unexpectedly". Thanks |
Hi @scottcha, |
I tried out the new implementation. Here are a couple of notes:
|
Hi @scottcha, |
Sorry it took me a bit to get back to this. Thanks! |
Ok, I'm glad to hear that Scott. |
I'll close this issue since the requested fix has already been implemented. Please, reopen it if necessary. |
Repro steps:
Expected result: show feature importance of features
Actual result: OOM--full repro and notebook here: https://github.com/scottcha/TsaiOOMRepro/blob/main/TsaiOOMRepro.ipynb
os : Linux-5.4.0-91-generic-x86_64-with-glibc2.17
python : 3.8.11
tsai : 0.2.24
fastai : 2.5.3
fastcore : 1.3.26
zarr : 2.10.0
torch : 1.9.1+cu102
n_cpus : 24
device : cuda (GeForce GTX 1080 Ti)
Stack Trace:
MemoryError Traceback (most recent call last)
/tmp/ipykernel_3968/3713785271.py in
----> 1 learn.feature_importance()
~/miniconda3/envs/tsai/lib/python3.8/site-packages/tsai/learner.py in feature_importance(self, feature_names, key_metric_idx, show_chart, save_df_path, random_state)
337 value = self.get_X_preds(X_valid, y_valid, with_loss=True)[-1].mean().item()
338 else:
--> 339 output = self.get_X_preds(X_valid, y_valid)
340 value = metric(output[0], output[1]).item()
341 print(f"{k:3} feature: {COLS[k]:20} {metric_name}: {value:8.6f}")
~/miniconda3/envs/tsai/lib/python3.8/site-packages/tsai/inference.py in get_X_preds(self, X, y, bs, with_input, with_decoded, with_loss)
16 print("cannot find loss as y=None")
17 with_loss = False
---> 18 dl = self.dls.valid.new_dl(X, y=y)
19 if bs: setattr(dl, "bs", bs)
20 else: assert dl.bs, "you need to pass a bs != 0"
~/miniconda3/envs/tsai/lib/python3.8/site-packages/tsai/data/core.py in new_dl(self, X, y)
486 assert X.ndim == 3, "You must pass an X with 3 dimensions [batch_size x n_vars x seq_len]"
487 if y is not None and not is_array(y) and not is_listy(y): y = [y]
--> 488 new_dloader = self.new(self.dataset.add_dataset(X, y=y))
489 return new_dloader
490
~/miniconda3/envs/tsai/lib/python3.8/site-packages/tsai/data/core.py in add_dataset(self, X, y, inplace)
422 @patch
423 def add_dataset(self:NumpyDatasets, X, y=None, inplace=True):
--> 424 return add_ds(self, X, y=y, inplace=inplace)
425
426 @patch
~/miniconda3/envs/tsai/lib/python3.8/site-packages/tsai/data/core.py in add_ds(dsets, X, y, inplace)
413 tls = dsets.tls if with_labels else dsets.tls[:dsets.n_inp]
414 new_tls = L([tl._new(item, split_idx=1) for tl,item in zip(tls, items)])
--> 415 return type(dsets)(tls=new_tls)
416 elif isinstance(dsets, TfmdLists):
417 new_tl = dsets._new(items, split_idx=1)
~/miniconda3/envs/tsai/lib/python3.8/site-packages/tsai/data/core.py in init(self, X, y, items, sel_vars, sel_steps, tfms, tls, n_inp, dl_type, inplace, **kwargs)
378 if len(self.tls) > 0 and len(self.tls[0]) > 0:
379 self.typs = [type(tl[0]) if isinstance(tl[0], torch.Tensor) else self.typs[i] for i,tl in enumerate(self.tls)]
--> 380 self.ptls = L([typ(stack(tl[:]))[...,self.sel_vars, self.sel_steps] if i==0 else typ(stack(tl[:]))
381 for i,(tl,typ) in enumerate(zip(self.tls,self.typs))]) if inplace and len(tls[0]) != 0 else tls
382
~/miniconda3/envs/tsai/lib/python3.8/site-packages/tsai/data/core.py in (.0)
378 if len(self.tls) > 0 and len(self.tls[0]) > 0:
379 self.typs = [type(tl[0]) if isinstance(tl[0], torch.Tensor) else self.typs[i] for i,tl in enumerate(self.tls)]
--> 380 self.ptls = L([typ(stack(tl[:]))[...,self.sel_vars, self.sel_steps] if i==0 else typ(stack(tl[:]))
381 for i,(tl,typ) in enumerate(zip(self.tls,self.typs))]) if inplace and len(tls[0]) != 0 else tls
382
~/miniconda3/envs/tsai/lib/python3.8/site-packages/tsai/data/core.py in getitem(self, it)
243 def subset(self, i, **kwargs): return type(self)(self.items, splits=self.splits[i], split_idx=i, do_setup=False, types=self.types, **kwargs)
244 def getitem(self, it):
--> 245 if hasattr(self.items, 'oindex'): return self.items.oindex[self._splits[it]]
246 else: return self.items[self._splits[it]]
247 def len(self): return len(self._splits)
~/miniconda3/envs/tsai/lib/python3.8/site-packages/zarr/indexing.py in getitem(self, selection)
602 selection = ensure_tuple(selection)
603 selection = replace_lists(selection)
--> 604 return self.array.get_orthogonal_selection(selection, fields=fields)
605
606 def setitem(self, selection, value):
~/miniconda3/envs/tsai/lib/python3.8/site-packages/zarr/core.py in get_orthogonal_selection(self, selection, out, fields)
939 indexer = OrthogonalIndexer(selection, self)
940
--> 941 return self._get_selection(indexer=indexer, out=out, fields=fields)
942
943 def get_coordinate_selection(self, selection, out=None, fields=None):
~/miniconda3/envs/tsai/lib/python3.8/site-packages/zarr/core.py in _get_selection(self, indexer, out, fields)
1107 # setup output array
1108 if out is None:
-> 1109 out = np.empty(out_shape, dtype=out_dtype, order=self._order)
1110 else:
1111 check_array_shape('out', out, out_shape)
MemoryError: Unable to allocate 315. GiB for an array with shape (60000, 978, 1441) and data type float32
The text was updated successfully, but these errors were encountered: