Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AL-Pipeline #33

Merged
merged 28 commits into from
Jan 3, 2023
Merged

AL-Pipeline #33

merged 28 commits into from
Jan 3, 2023

Conversation

paulmorio
Copy link
Collaborator

@paulmorio paulmorio commented Dec 4, 2022

Hi all,

This is a PR related to #23 which we have been planning for a while. As #23 describes we propose refactoring the code to decouple the functionality of the active learning pipeline from the active learning strategy.

In principle the proposed GenericPipeline class will succeed the GenericALStrategy in functionality, and serves to help the user decouple the implementation of their strategies from the business logic of running the pipeline.

From a structural perspective a Pipeline will be a composition of the following objects

  • a DataManager
  • a ML model
  • an AL strategy
  • an oracle (implementing the new pyrelational.oracle.generic_oracle.GenericOracle interface)

It retains all of the methods of the previous GenericActiveLearningStrategy it replaces. Of particular note is the active_learning_step method which now calls upon the method of the same name in the strategy, filtering the arguments to match the signature of the concrete method implementations.

This involves the modification of several existing modules, most notably the GenericActiveLearningStrategy and all of the strategy implementations dependent on it. The refactor also includes a new GenericOracle interface to spur development of different concrete oracle classes.

The GenericActiveLearningStrategy is now an abstract class which acts as an interface for strategy implementations, each concrete strategy must implement an active_learning_step(self, *args, **kwargs) function. The GenericPipeline.active_learning_step(self, *args, **kwargs) will filter the **kwargs to match signature of the implemented GenericActiveLearningStrategy.active_learning_step method.

The change also brings forward the implementation of a GenericOracle interface whose concrete implementations must implement the update_dataset(self, data_manager, indices) method.

Updates will be made to this PR to update all of the tests and documentations towards the new API as we discuss any detailed changes that need to be done to the proposed changes.

@paulmorio paulmorio added enhancement New feature or request help wanted Extra attention is needed labels Dec 4, 2022
@paulmorio paulmorio self-assigned this Dec 4, 2022
@paulmorio
Copy link
Collaborator Author

paulmorio commented Dec 4, 2022

Updated the active_learning_step pattern to input the pipeline attributes as default kwargs that get overwritten/updated by any user kwargs.

Also added a working example (with the associated refactoring needed in the dependent modules). A somewhat notable element is the refactoring required on the end of the strategy implementations as it will not have direct access to the @property decorated methods anymore, which are used extensively as shortcuts to data_manager attributes and methods when used by the active_learning_step of the strategy. These would have to make the calls to the passed data_manager directly.

@paulmorio paulmorio marked this pull request as ready for review December 4, 2022 06:36
@paulmorio
Copy link
Collaborator Author

All tests updated to reflect new API changes

a-pouplin
a-pouplin previously approved these changes Dec 4, 2022
Copy link
Collaborator

@a-pouplin a-pouplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything seems good to me !

Copy link
Contributor

@thomasgaudelet thomasgaudelet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on the surface. Looking forward to chat tomorrow

Copy link
Contributor

@thomasgaudelet thomasgaudelet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more comments, some might be outside scope of this PR though, just raising awareness. The unit tests should probably be refactored as well. But let's do that after merge.

paulmorio and others added 2 commits December 13, 2022 00:32
Co-authored-by: thomasgaudelet <69150449+thomasgaudelet@users.noreply.github.com>
@paulmorio
Copy link
Collaborator Author

Updated and ready for another review

Copy link
Contributor

@thomasgaudelet thomasgaudelet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! small comments, some maybe should be moved to issues instead of being addressed in this PR.

@a-pouplin a-pouplin self-requested a review December 15, 2022 13:59
@paulmorio paulmorio merged commit d80a208 into main Jan 3, 2023
@thomasgaudelet thomasgaudelet deleted the al-pipeline branch January 4, 2023 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants