-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix empty data loading in Insights #728
Conversation
@Reubend has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@Reubend has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
@@ -439,7 +441,22 @@ def _calculate_vis_output( | |||
return results if results else None | |||
|
|||
def _get_outputs(self) -> List[Tuple[List[VisualizationOutput], SampleCache]]: | |||
batch_data = next(self._dataset_iter) | |||
# If we run out of new betches, then we need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
betches -> batches ?
LGTM! @bilalsal, @edward-io do you have any comments on this issue ? |
@@ -199,6 +200,7 @@ class scores. | |||
self._outputs: List[VisualizationOutput] = [] | |||
self._config = FilterConfig(prediction="all", classes=[], num_examples=4) | |||
self._dataset_iter = iter(dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't your PR be simpler to construct this as cycle(iter(dataset))
since it seems you're effectively reimplementing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be much simpler that way, but my concern is about memory usage. Per the docs, it seems that cycle
may require significant auxiliary storage (depending on the length of the iterable)
Which suggests to me that it stores the entire thing in an array and then loops around. If the dataset is huge, then I think this could be an issue, because every batch would get stored in memory. That's why I made the cache only store a few batches instead, rather than using a cycle over the entire dataset.
However, if you don't think the memory is an issue, I could do it this way instead. It would probably only be an issue if somebody requested a lot of attributions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks for clarifying. Probably best to assume memory will be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Previously, when Captum Insights had already shown all existing batches from the dataset, it would stop showing data at all after fetches. This change makes it recycle some batches, so that the user can keep clicking on "Fetch Data" to see the changes that their setting have on the results.
This was reported in #686