[Model Compression] Pruning Scheduler #4089

J-shang · 2021-08-20T04:46:13Z

depend on pr #4074

linbinskn · 2021-09-05T13:53:18Z

nni/algorithms/compression/v2/pytorch/base/scheduler.py

+
+
+class TaskResult:
+    def __init__(self, task_id: int, compact_model: Module, compact_model_masks: Dict[str, Dict[str, Tensor]],


It seems compact_model_masks will not be used at all since it is empty when model has been speedup or it will be same as old_structure_masks if not. I am curious why not delete it here?

you are right, compact_model_mask is redundant now. In fact, it can be replaced by the boolean flag if the model is speedup. But for the speedup method, some cases can not apply speedup, and I think these cases-related masks should be in the empty compact_model_masks in the future. It is a reserved interface for the residue masks after speedup. Maybe users can use this to apply customized speedup.

linbinskn · 2021-09-05T14:09:33Z

nni/algorithms/compression/v2/pytorch/pruning/tools/sparsity_allocator.py

@@ -27,8 +27,9 @@ def generate_sparsity(self, metrics: Dict[str, Tensor]) -> Dict[str, Dict[str, T
            metric = metrics[name] * self._compress_mask(wrapper.weight_mask)
            prune_num = int(sparsity_rate * metric.numel())
            if prune_num == 0:
-                continue
-            threshold = torch.topk(metric.view(-1), prune_num, largest=False)[0].max()
+                threshold = metric.min() - 1


Why we handle it in this way instead of previous treatment.

Because even though the mask is all one, maybe it is better to be saved in masks.

nni/algorithms/compression/v2/pytorch/base/scheduler.py

QuanluZhang · 2021-09-06T06:44:23Z

nni/algorithms/compression/v2/pytorch/base/pruner.py

@@ -125,54 +128,3 @@ def show_pruned_weights(self, dim: int = 0):
                sum_idx.remove(dim)
                index = torch.nonzero(weight_mask.abs().sum(sum_idx) != 0, as_tuple=False).tolist()
            _logger.info(f'simulated prune {wrapper.name} remain/total: {len(index)}/{weight_mask.size(dim)}')
-
-    def export_model(self, model_path, mask_path=None, onnx_path=None, input_shape=None, device=None):


export is removed, why?

Because I think the old export_model() is useless now because we return masks directly in compress(), maybe let users handle export logic is better, or we can support a new export_model().

After removing the exporting function, what's the new interface between the pruner and speedup?

revert export_model, and remove the onnx export.

QuanluZhang · 2021-09-06T06:57:03Z

nni/algorithms/compression/v2/pytorch/pruning/basic_scheduler.py

+        # pruning model
+        self.pruner.reset(model, config_list)
+        self.pruner.load_masks(masks)
+        compact_model, old_structure_masks = self.pruner.compress()


old_structure_masks -> pruner_generated_masks

QuanluZhang · 2021-09-06T11:13:01Z

nni/algorithms/compression/v2/pytorch/pruning/tools/base.py

+        self._intermidiate_result_dir = Path(self._log_dir_root, 'intermidiate_result')
+        self._intermidiate_result_dir.mkdir(parents=True, exist_ok=True)
+
+        # save origin data in {log_dir}/intermidiate_model


"intermidiate_model"?

intermidiate_data?

nni/algorithms/compression/v2/pytorch/pruning/tools/task_generator.py

nni/algorithms/compression/v2/pytorch/pruning/tools/sparsity_allocator.py

QuanluZhang · 2021-09-06T11:47:31Z

nni/algorithms/compression/v2/pytorch/utils/pruning.py

+    """
+    Compare origin model and compact model, return the sparsity of each group mentioned in config list.
+    A group means all layer mentioned in one config.
+    i.e., a linear named 'linear1' and its weight size is [100, 100] in origin model, but in compact model,


i.e. -> e.g.

QuanluZhang · 2021-09-06T11:50:23Z

nni/algorithms/compression/v2/pytorch/utils/pruning.py

+def compute_sparsity(origin_model: Module, compact_model: Module, compact_model_masks: Dict[str, Dict[str, Tensor]],
+                     config_list: List[Dict]) -> Tuple[List[Dict], List[Dict], List[Dict]]:
+    """
+    The current model means the compact model applied the masks. The compact model is the origin model after speed up.


where is "current model"?

This function computes how much the origin model has been compressed in the current state. The current state means compact_model + compact_model_masks (i.e., compact_model_masks applied on compact_model).

QuanluZhang · 2021-09-06T12:20:04Z

nni/algorithms/compression/v2/pytorch/pruning/tools/base.py

            # NOTE: assume we only mask output, so the mask and bias have a one-to-one correspondence.
            # If we support more kind of masks, this place need refactor.
            if wrapper.bias_mask is not None and weight_mask.size() == wrapper.bias_mask.size():
-                expand_mask['bias_mask'] = weight_mask.clone()
+                expand_mask['bias'] = weight_mask.clone()
        return expand_mask

    def _compress_mask(self, mask: Tensor) -> Tensor:


add docstring

QuanluZhang · 2021-09-06T12:34:06Z

nni/algorithms/compression/v2/pytorch/utils/pruning.py

+    Returns
+    -------
+    Tuple[List[Dict], List[Dict], List[Dict]]
+        (current2origin_sparsity, compact2origin_sparsity, mask2compact_sparsity).


add more description

QuanluZhang · 2021-09-06T12:36:49Z

@J-shang , please add tests for pruning scheduler, maybe in the next pr if you like

…heduler

zheng-ningxin · 2021-09-09T06:02:19Z

nni/algorithms/compression/v2/pytorch/base/scheduler.py

+        self.state = {}
+
+        for ref in self.referenced_paths():
+            self._reference_counter.setdefault(ref, 0)


Is _reference_counter defined in the whole class the same object with self._reference_counter here?

We directly set the reference count to zero here what if ref is also used by other threads?

yes, all self._reference_counter in different instances refer to the same dict _reference_counter in class.

it is a good question, maybe we need to use some kind of lock if support multi-threads. for now, we run pruning in a single thread, and I will add comments in this part to remind us, need to refactor this part if we want to support multi-threads.

zheng-ningxin · 2021-09-10T02:36:33Z

nni/algorithms/compression/v2/pytorch/base/scheduler.py

+        task_id
+            The unique id of task.
+        compact_model
+            The unwrapped compact pytorch model after pruning. If the compact model has speed up process during pruning,


has been speeduped during the pruning process

thx, fix it

zheng-ningxin · 2021-09-10T02:39:35Z

nni/algorithms/compression/v2/pytorch/base/scheduler.py

+        compact_model
+            The unwrapped compact pytorch model after pruning. If the compact model has speed up process during pruning,
+            it will have a smaller structure compare with the model before pruning.
+            If the compact model do not speed up, it will have the same structure with the model before pruning.


Same here, do not speed up -> has not been speeduped?

J-shang added 30 commits July 7, 2021 13:26

upload

59017dd

replace

668970c

update FPGMPruner

9eb69ab

update global sort

96b5ede

update slim

362e390

update dependency aware

c92195c

update

4192e7f

update apoz & mean rank

4bd0ac5

update

9dbb49d

update

5ed50e9

update

577c2fa

update doc str

5e996ea

Merge remote-tracking branch 'upstream/master' into compression_v2

1b6b804

update taylor pruner

bb43626

update

25fa44b

update

b72c628

update export

b583ba5

update validate config

d5a36f7

update

9c67cf3

rename compression_v2 to compression.v2

0984e35

update

3c4d16c

update

2e94310

update comment

f9a5eca

update comments

c04e506

update comments

dda6966

update comments

b88f5a4

update mask consistent

11c23fb

update comments

75fa614

rm useless file

95db43f

fix comments

55fef39

add subfolder under log_dir

36eacf7

J-shang requested review from QuanluZhang, xiaowu0162 and zheng-ningxin September 3, 2021 06:23

xiaowu0162 approved these changes Sep 4, 2021

View reviewed changes

linbinskn reviewed Sep 5, 2021

View reviewed changes