Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated examples #54

Merged
merged 9 commits into from
Jan 16, 2023
Merged

Updated examples #54

merged 9 commits into from
Jan 16, 2023

Conversation

paulmorio
Copy link
Collaborator

Updating the examples under examples/demo/ with new Pipeline framework

What is the goal of this PR?

The PR updates the examples under examples/demo/* to utilise the new Pipeline API. These examples are starting points for new-comers and experienced users of PyRelational wanting to explore self-contained examples of creating custom models, strategies, or quickly using some of our included modules.

What are the changes implemented in this PR?

@paulmorio paulmorio self-assigned this Jan 6, 2023
@paulmorio
Copy link
Collaborator Author

paulmorio commented Jan 6, 2023

I would like to note that one of the examples specifically examples/demo/model_gaussianprocesses.py runs into an issue as the training of the GP throws an error when attempting to train on the labelled subset of the training data, in the first iteration of the AL cycle.

File ~/workspace/pyrelational/examples/demo/model_gaussianprocesses.py:55, in PyLWrapper.forward(self, x)
     54 def forward(self, x):
---> 55     return self.gpmodel(x)

File ~/miniconda3/envs/pyrelational/lib/python3.8/site-packages/gpytorch/models/exact_gp.py:257, in ExactGP.__call__(self, *args, **kwargs)
    255 if settings.debug.on():
    256     if not all(torch.equal(train_input, input) for train_input, input in zip(train_inputs, inputs)):
--> 257         raise RuntimeError("You must train on the training inputs!")
    258 res = super().__call__(*inputs, **kwargs)
    259 return res

@a-pouplin , @thomasgaudelet I am not as well versed in using GPyTorch, maybe you've seen this before?

@a-pouplin
Copy link
Collaborator

I would like to note that one of the examples specifically examples/demo/model_gaussianprocesses.py runs into an issue as the training of the GP throws an error when attempting to train on the labelled subset of the training data, in the first iteration of the AL cycle.

File ~/workspace/pyrelational/examples/demo/model_gaussianprocesses.py:55, in PyLWrapper.forward(self, x)
     54 def forward(self, x):
---> 55     return self.gpmodel(x)

File ~/miniconda3/envs/pyrelational/lib/python3.8/site-packages/gpytorch/models/exact_gp.py:257, in ExactGP.__call__(self, *args, **kwargs)
    255 if settings.debug.on():
    256     if not all(torch.equal(train_input, input) for train_input, input in zip(train_inputs, inputs)):
--> 257         raise RuntimeError("You must train on the training inputs!")
    258 res = super().__call__(*inputs, **kwargs)
    259 return res

@a-pouplin , @thomasgaudelet I am not as well versed in using GPyTorch, maybe you've seen this before?

I don't remember seeing this error before. It's not related to the GPytorch version (1.4 throws the same error). The trainset self.train_inputs is not being updated by the data manager inside the GPytorch class, which throw the error. They seem to have a solution here, where they reset the trainset:

gp_model.train()
gp_model.set_train_data(new_train_x, new_train_y)
output = gp_model(new_train_x)

but we are using different wrappers which make it less easy to implement.

I am wondering if it wouldn't be easier to write our own GP.

a-pouplin
a-pouplin previously approved these changes Jan 8, 2023
Copy link
Collaborator

@a-pouplin a-pouplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you Paul :)

I just noticed a few things:

(1) When printing the percentage of labelled data, the function in the data manger percentage_labelled returns a float in (0,1), instead of a percentage. Should we rename the function or multiply the output by 100?

(2) When using a task agnostic strategy, some methods (AffinityPropagation for example) don't support num_annotate. Would you like to add a warning to inform the user that this strategy will just ignore this argument?
For example, the lines 123-127 of task_agnostic can be changed with:

    if isinstance(clustering_method, str) and hasattr(sklust, clustering_method):
        clustering_method = getattr(sklust, clustering_method)
        if "n_clusters" in inspect.getfullargspec(clustering_method).args:
            clustering_kwargs["n_clusters"] = num_annotate
        elif ("n_clusters" not in inspect.getfullargspec(clustering_method).args) and (
            num_annotate is not None
        ): 
            warnings.warn('Clustering method does not support num_annotate, ignore it.')
        clustering_cls = clustering_method(**clustering_kwargs)

@thomasgaudelet
Copy link
Contributor

LGTM, thank you Paul :)

I just noticed a few things:

(1) When printing the percentage of labelled data, the function in the data manger percentage_labelled returns a float in (0,1), instead of a percentage. Should we rename the function or multiply the output by 100?

(2) When using a task agnostic strategy, some methods (AffinityPropagation for example) don't support num_annotate. Would you like to add a warning to inform the user that this strategy will just ignore this argument? For example, the lines 123-127 of task_agnostic can be changed with:

    if isinstance(clustering_method, str) and hasattr(sklust, clustering_method):
        clustering_method = getattr(sklust, clustering_method)
        if "n_clusters" in inspect.getfullargspec(clustering_method).args:
            clustering_kwargs["n_clusters"] = num_annotate
        elif ("n_clusters" not in inspect.getfullargspec(clustering_method).args) and (
            num_annotate is not None
        ): 
            warnings.warn('Clustering method does not support num_annotate, ignore it.')
        clustering_cls = clustering_method(**clustering_kwargs)

I have addressed 2. in the other PR :)

@paulmorio
Copy link
Collaborator Author

Hi @a-pouplin,

Great catch on both of those points, I chatted with Thomas and will do the following:

  1. I will update the percentage_labelled to multiply by 100 to get the actual percentage value and update the test accordingly as well.
  2. Thomas addresses this in Refactor unit tests and add some #55
  3. It's interesting we didn't run into the GPyTorch issue before, Thomas figured a fix we can employ from the DataManager side for now.

@paulmorio
Copy link
Collaborator Author

@thomasgaudelet its been updated to match the merged PR

@paulmorio paulmorio requested a review from a-pouplin January 13, 2023 18:06
Copy link
Contributor

@thomasgaudelet thomasgaudelet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@a-pouplin a-pouplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super ! Sorry for the late reply, LGTM :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Pretty printing for Pipeline instance has a bug
3 participants