Separate preconditioned update from gradient update #2003

samdporter · 2024-12-09T16:11:58Z

samdporter
Dec 9, 2024
Collaborator

Hey all,

Would it make more sense to separate the gradient update from the preconditioned update during the update method of an algorithm?

For example in FISTA:

      # gradient step
      self.f.gradient(self.x_old, out=self.gradient_update)
      if self.preconditioner is not None:
          self.preconditioner.apply(
              self, self.gradient_update, out=self.gradient_update)

      try:
          step_size = self.step_size_rule.get_step_size(self)
      except NameError:
          raise NameError(msg='`step_size` must be `None`, a real float or a child class of :meth:`cil.optimisation.utilities.StepSizeRule`')

      self.x_old.sapyb(1., self.gradient_update, -step_size, out=self.x_old)

Could instead be something like

    ### step size choice before the preconditioner
    try:
        step_size = self.step_size_rule.get_step_size(self)
    except NameError:
        raise NameError(msg='`step_size` must be `None`, a real float or a child class of :meth:`cil.optimisation.utilities.StepSizeRule`')

    # preconditioner step - separating preconditioner from gradient update
    if self.preconditioner is not None:
        self.x_old.sapyb(1., self.preconditioner.apply(self.gradient_update), -step_size, out=self.x_old)
    else:
        self.x_old.sapyb(1., self.gradient_update, -step_size, out=self.x_old)

This would be helpful for debugging - the gradient update and preconditioned update could both be accessed by a callback - and also for other use cases such as a line search, where both the preconditioned update and gradient update may be required.

MargaretDuff · 2024-12-10T09:10:07Z

MargaretDuff
Dec 10, 2024
Maintainer

Hi @samdporter - this looks like swapping the order of the step size and preconditioner calculations. So that you calculate the step-size based on the un-preconditioned gradient (where gradient is used). Armijo rule uses the gradient? Do we want the step size to be calculated on the preconditioned or not preconditioned gradient there?

3 replies

samdporter Dec 10, 2024
Collaborator Author

Hey,

The Armijo rule uses the inner product of the descent direction and the gradient (so you need both the preconditioned gradient and the gradient, I think).

The issue with the suggested code there is that you would have to compute the preconditioned gradient twice, I guess. Maybe the code confused my point a bit. Sorry.

The point that I was trying to make was that it would be nice to be able to access the gradient update before preconditioning. At the moment, you are unable to access the non-preconditioned gradient with a callback for ApproximateGradientSumFunctions where the gradient changes with each call.

MargaretDuff Dec 10, 2024
Maintainer

The point that I was trying to make was that it would be nice to be able to access the gradient update before preconditioning. At the moment, you are unable to access the non-preconditioned gradient with a callback for ApproximateGradientSumFunctions where the gradient changes with each call.

Ah that makes sense

MargaretDuff Dec 10, 2024
Maintainer

@epapoutsellis - Can we bring you into this conversation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate preconditioned update from gradient update #2003

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Separate preconditioned update from gradient update #2003

samdporter Dec 9, 2024 Collaborator

Replies: 1 comment · 3 replies

MargaretDuff Dec 10, 2024 Maintainer

samdporter Dec 10, 2024 Collaborator Author

MargaretDuff Dec 10, 2024 Maintainer

MargaretDuff Dec 10, 2024 Maintainer

samdporter
Dec 9, 2024
Collaborator

Replies: 1 comment 3 replies

MargaretDuff
Dec 10, 2024
Maintainer

samdporter Dec 10, 2024
Collaborator Author

MargaretDuff Dec 10, 2024
Maintainer

MargaretDuff Dec 10, 2024
Maintainer