Updated examples #54

paulmorio · 2023-01-06T20:39:01Z

Updating the examples under examples/demo/ with new Pipeline framework

What is the goal of this PR?

The PR updates the examples under examples/demo/* to utilise the new Pipeline API. These examples are starting points for new-comers and experienced users of PyRelational wanting to explore self-contained examples of creating custom models, strategies, or quickly using some of our included modules.

What are the changes implemented in this PR?

Each of the scripts under examples/demo/ have been updated.
We fix [BUG] Pretty printing for Pipeline instance has a bug #53 as it gets used extensively in the examples

…ation example

…esentative classification, example with mcdropout as bayes approximation

paulmorio · 2023-01-06T20:43:55Z

I would like to note that one of the examples specifically examples/demo/model_gaussianprocesses.py runs into an issue as the training of the GP throws an error when attempting to train on the labelled subset of the training data, in the first iteration of the AL cycle.

File ~/workspace/pyrelational/examples/demo/model_gaussianprocesses.py:55, in PyLWrapper.forward(self, x)
     54 def forward(self, x):
---> 55     return self.gpmodel(x)

File ~/miniconda3/envs/pyrelational/lib/python3.8/site-packages/gpytorch/models/exact_gp.py:257, in ExactGP.__call__(self, *args, **kwargs)
    255 if settings.debug.on():
    256     if not all(torch.equal(train_input, input) for train_input, input in zip(train_inputs, inputs)):
--> 257         raise RuntimeError("You must train on the training inputs!")
    258 res = super().__call__(*inputs, **kwargs)
    259 return res

@a-pouplin , @thomasgaudelet I am not as well versed in using GPyTorch, maybe you've seen this before?

a-pouplin · 2023-01-08T18:52:11Z

I would like to note that one of the examples specifically examples/demo/model_gaussianprocesses.py runs into an issue as the training of the GP throws an error when attempting to train on the labelled subset of the training data, in the first iteration of the AL cycle.
File ~/workspace/pyrelational/examples/demo/model_gaussianprocesses.py:55, in PyLWrapper.forward(self, x)
     54 def forward(self, x):
---> 55     return self.gpmodel(x)

File ~/miniconda3/envs/pyrelational/lib/python3.8/site-packages/gpytorch/models/exact_gp.py:257, in ExactGP.__call__(self, *args, **kwargs)
    255 if settings.debug.on():
    256     if not all(torch.equal(train_input, input) for train_input, input in zip(train_inputs, inputs)):
--> 257         raise RuntimeError("You must train on the training inputs!")
    258 res = super().__call__(*inputs, **kwargs)
    259 return res
@a-pouplin , @thomasgaudelet I am not as well versed in using GPyTorch, maybe you've seen this before?

I don't remember seeing this error before. It's not related to the GPytorch version (1.4 throws the same error). The trainset self.train_inputs is not being updated by the data manager inside the GPytorch class, which throw the error. They seem to have a solution here, where they reset the trainset:

gp_model.train()
gp_model.set_train_data(new_train_x, new_train_y)
output = gp_model(new_train_x)

but we are using different wrappers which make it less easy to implement.

I am wondering if it wouldn't be easier to write our own GP.

a-pouplin

LGTM, thank you Paul :)

I just noticed a few things:

(1) When printing the percentage of labelled data, the function in the data manger percentage_labelled returns a float in (0,1), instead of a percentage. Should we rename the function or multiply the output by 100?

(2) When using a task agnostic strategy, some methods (AffinityPropagation for example) don't support num_annotate. Would you like to add a warning to inform the user that this strategy will just ignore this argument?
For example, the lines 123-127 of task_agnostic can be changed with:

    if isinstance(clustering_method, str) and hasattr(sklust, clustering_method):
        clustering_method = getattr(sklust, clustering_method)
        if "n_clusters" in inspect.getfullargspec(clustering_method).args:
            clustering_kwargs["n_clusters"] = num_annotate
        elif ("n_clusters" not in inspect.getfullargspec(clustering_method).args) and (
            num_annotate is not None
        ): 
            warnings.warn('Clustering method does not support num_annotate, ignore it.')
        clustering_cls = clustering_method(**clustering_kwargs)

thomasgaudelet · 2023-01-09T15:42:17Z

LGTM, thank you Paul :)

I just noticed a few things:

(1) When printing the percentage of labelled data, the function in the data manger percentage_labelled returns a float in (0,1), instead of a percentage. Should we rename the function or multiply the output by 100?

(2) When using a task agnostic strategy, some methods (AffinityPropagation for example) don't support num_annotate. Would you like to add a warning to inform the user that this strategy will just ignore this argument? For example, the lines 123-127 of task_agnostic can be changed with:
    if isinstance(clustering_method, str) and hasattr(sklust, clustering_method):
        clustering_method = getattr(sklust, clustering_method)
        if "n_clusters" in inspect.getfullargspec(clustering_method).args:
            clustering_kwargs["n_clusters"] = num_annotate
        elif ("n_clusters" not in inspect.getfullargspec(clustering_method).args) and (
            num_annotate is not None
        ): 
            warnings.warn('Clustering method does not support num_annotate, ignore it.')
        clustering_cls = clustering_method(**clustering_kwargs)

I have addressed 2. in the other PR :)

paulmorio · 2023-01-09T20:40:54Z

Hi @a-pouplin,

Great catch on both of those points, I chatted with Thomas and will do the following:

I will update the percentage_labelled to multiply by 100 to get the actual percentage value and update the test accordingly as well.
Thomas addresses this in Refactor unit tests and add some #55
It's interesting we didn't run into the GPyTorch issue before, Thomas figured a fix we can employ from the DataManager side for now.

paulmorio · 2023-01-13T18:06:25Z

@thomasgaudelet its been updated to match the merged PR

thomasgaudelet

LGTM

a-pouplin

Super ! Sorry for the late reply, LGTM :)

paulmorio added 4 commits January 6, 2023 17:32

Fix for #53

6243a74

Ensemble uncertainty classification and lightning diversity classific…

82ec08a

…ation example

Examples updated to pipeline API lightning diversity regression, repr…

833f9b8

…esentative classification, example with mcdropout as bayes approximation

Examples using updated Pipeline API

cc7707a

paulmorio requested review from a-pouplin, thomasgaudelet and alicerdv January 6, 2023 20:39

paulmorio self-assigned this Jan 6, 2023

a-pouplin previously approved these changes Jan 8, 2023

View reviewed changes

paulmorio added 2 commits January 9, 2023 20:50

Setting shuffle to false in datamanager for GPyTorch

edc4696

Proper percentage values

a27590c

paulmorio dismissed a-pouplin’s stale review via a27590c January 9, 2023 20:51

paulmorio added 3 commits January 13, 2023 17:11

Addressing merge conflicts, before taking on new API

34255d9

Update tests to reflect new output of percentage labelled

6db6943

Updated examples to newest API

4dd7175

paulmorio requested a review from a-pouplin January 13, 2023 18:06

thomasgaudelet approved these changes Jan 14, 2023

View reviewed changes

a-pouplin approved these changes Jan 15, 2023

View reviewed changes

paulmorio merged commit c615e4e into main Jan 16, 2023

paulmorio mentioned this pull request Jan 16, 2023

[BUG] Pretty printing for Pipeline instance has a bug #53

Closed

thomasgaudelet deleted the new-examples branch January 16, 2023 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updated examples #54

Updated examples #54

Uh oh!

paulmorio commented Jan 6, 2023

Uh oh!

paulmorio commented Jan 6, 2023 •

edited

Loading

Uh oh!

a-pouplin commented Jan 8, 2023

Uh oh!

a-pouplin left a comment

Uh oh!

thomasgaudelet commented Jan 9, 2023

Uh oh!

paulmorio commented Jan 9, 2023

Uh oh!

paulmorio commented Jan 13, 2023

Uh oh!

thomasgaudelet left a comment

Uh oh!

a-pouplin left a comment

Uh oh!

Uh oh!

Updated examples #54

Updated examples #54

Uh oh!

Conversation

paulmorio commented Jan 6, 2023

Updating the examples under examples/demo/ with new Pipeline framework

What is the goal of this PR?

What are the changes implemented in this PR?

Uh oh!

paulmorio commented Jan 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-pouplin commented Jan 8, 2023

Uh oh!

a-pouplin left a comment

Choose a reason for hiding this comment

Uh oh!

thomasgaudelet commented Jan 9, 2023

Uh oh!

paulmorio commented Jan 9, 2023

Uh oh!

paulmorio commented Jan 13, 2023

Uh oh!

thomasgaudelet left a comment

Choose a reason for hiding this comment

Uh oh!

a-pouplin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

paulmorio commented Jan 6, 2023 •

edited

Loading