Skip to content

Commit

Permalink
modify tracin self influence helpers (pytorch#994)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: pytorch#994

change `TracInCP._self_influence_batch_tracincp` and `TracInCP._self_influence_batch_tracincp` `TracInCP._self_influence_batches_tracincp_fast` to be named `self_influence`, which is now public, and now accept a DataLoader yielding batches (as well as a single batch, as before).  The modified helper function can be called by external functions to compute self influence.

The helper itself is also changed to improve efficiency, by reducing the number of times checkpoints are loaded.  The modified helper, despite being able to compute self influence scores for a dataloader yielding batches, still only loads each checkpoint once, per call.  This is because the modified helper now has an outer iteration over checkpoints, and an inner iteration over batches (the order of iteration is reversed compared to before). This helper is called by `influence` when running it in self influence mode.

The reason we cannot just increase the batch size to reduce the number of checkpoint loadings is that for large models (precisely those for which loading checkpoints is expensive), the model takes up too much memory, so that the batch size cannot be too large.

Minor change: for `influence_src_dataset` argument of all `__init__`'s, add description of what assumptions we make of the batches yielded by the dataloader.

Reviewed By: NarineK

Differential Revision: D35603078

fbshipit-source-id: 56efa7ca82253a71c3ea143f3e2f1cabbe483b58
  • Loading branch information
99warriors authored and facebook-github-bot committed Jul 22, 2022
1 parent b84980a commit d1d78d2
Show file tree
Hide file tree
Showing 5 changed files with 653 additions and 294 deletions.
Loading

0 comments on commit d1d78d2

Please sign in to comment.