Conversation
src/tabpfn/regressor.py
Outdated
|
|
||
| embeddings: list[np.ndarray] = [] | ||
|
|
||
| for output, config in self.executor_.iter_outputs( |
There was a problem hiding this comment.
Perhaps I am missing something but I don't see how you are getting the train/test embeddings for X. Shouldn't the iter_outputs method be passed a new param that would invoke the model with only_return_standard_out=False?
|
Thanks so much for looking into this! Unfortunately the change does not actually retrieve the embeddings now, see the review comments. |
|
Oh shoot! Sorry, I saw in the And traced it back to the regressor and classifier. Would adding the arg |
Looks that way, but you need to indicate this through the |
|
Hi @iivalchev Thanks for all the feedback! The latest commit tries to incorporate all the suggestions mentioned above.
Everything was working as expected on my side; hope this change is acceptable. |
src/tabpfn/inference.py
Outdated
| # We'd rather just say unload from GPU, we already have it available on CPU. | ||
| model = model.cpu() # noqa: PLW2901 | ||
|
|
||
| output = output['test_embeddings'] if isinstance(output, dict) else output |
There was a problem hiding this comment.
This change is a bit hacky for the application, ideally, we would retrieve the entire dict at this step and in the next step decide which entry to select. That would change the return type of this function.
Alternatively we could pass a parameter return_field = 'test_embeddings' which selects whcih field to select and which is linked to the only_return_standard_out. The first option would be cleaner though?
|
Great! I think this should be working already, I just had some comments to make it more maintainable. |
|
Hi @noahho Should I make the changes to rather pass the dictionary to the |
I would let iter_outputs return the whole dictionary if not returning single out. Then Let get_embeddings_ select the test or train embeddings where the user specifies if would like to get test or train embedding. You think that makes sense? |
|
Hi @noahho Awesome, agreed! I have added the additional arg |
|
Great! I just fixed the merge conflict and now CI should run. |
|
Hi @noahho Seems like it is failing for python 3.9 but not 3.12 |
|
Dear Authors Thanks for this great work! The performance in the paper is amazing and I have heard many good things about the work. I am eager to try it myself. I'm wondering do you have an expected timeline when this feature (obtaining embedding) would be available? Thanks! |
|
PR looks great, thank you for the contribution @KulikDM ! |
|
@KulikDM Hi. Thanks for the great work. I'm still confused of how to get embeddings from a fitted model, such as the |
* Record copied public PR 522 * Bump mypy from 1.17.1 to 1.18.2 (#522) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (cherry picked from commit 3254993) --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Embeddings V2
Hi,
This commit addresses #111
I have made two PRs where the functionality is slightly different, but they essentially both do the exact same thing.
Please pick which one you think is best for the package.
V2 adds the
get_embeddingsto each estimator.Benefit is easier to find API, downside is more duplicate code.
Also added the appropriate tests.
Hope this helps!