Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable self supervised pretraining #220

Merged
merged 3 commits into from
Dec 7, 2020

Conversation

Optimox
Copy link
Collaborator

@Optimox Optimox commented Oct 27, 2020

What kind of change does this PR introduce?
Implements self supervised pretraining
Does this PR introduce a breaking change?

I would say so
What needs to be documented once your changes are merged?
Some documentation

Closing issues

Closes #187 #232

@athewsey
Copy link
Contributor

Hi again!

Big fan of:

  • Separating TabNetEncoder from the decisioning/decoder logic
  • Nice clean-diff-oriented/JS-like indenting, trailing commas, and double quotes in updated code style... even if it makes merging a bit of a pain 😂
  • Thanks for starting on this, as I'd been thinking about it myself recently too and am keen to see it implemented!

I'd suggest:

  • Bringing pre-training closer together with the supervised workflow, even if that means breaking API changes & a v3 milestone
  • Killing TabNetNoEmbeddings abstraction layer: We have EmbeddingGenerator, TabNetEncoder, TabNetDecoder, and each supervised task output just appears to be an FC layer which may not be worthy of its own class.
  • Thinking about whether there's an opportunity here to make every model share the multi-task base
  • Directly using NaN/None as the pre-embedding mask value for all fields, and by that method support missing values in input data

For me the ideal workflow here is that saved models will have trained embedding + encoder; may have a decoder; and may have one or more output/decision modules - depending on whether they've been pre-trained & what supervised task(s) they've been trained on.

For example we could use a ModuleDict to store multiple named decision modules, and explicitly set up the API so that when you .fit() on one or multi supervised task(s) you give the task(s) name(s). Users could explicitly fine-tune existing supervised task(s) (or pre-training) with new data; cross-train the encoder & embeddings to a new task; and so on.

I would like to see the training procedure (e.g. *Model) use None/NaN as the mask value so it looks the same as actual missing data, and have missing-value featurization done (configurably) in EmbeddingGenerator: as I think this could help simplify use of the network for missing value imputation. For e.g. I've been playing with a rough draft on athewsey/feat/tra in which EmbeddingGenerator takes a nonfinite_treatment parameter which can specify either globally or per-feature to either:

  • Ignore non-finite values
  • Mask them with a particular value (e.g. 0 per the original paper, -1 or something out of range for that field, or whatever)
  • ...Or add an extra 0/1 is_missing dimension to that feature's embedding, and mask other column(s) to 0 when missing

In a small preliminary test (on Forest Cover Type again 😂 - just masking some values at random and doing supervised classification task), I actually found that this extra column featurization delivered best results despite significantly increasing the post_embed_dim... Which I took as A) TabNet is good at learning to attend to columns already and B) I was using embedding-aware attention, so this treatment didn't actually increase the dimensionality of the mask: Just the FeatureTransformer inputs. 0 filling performed worst, and out-of-range value filling was between the two.

Thanks again & LMK your thoughts!

@Optimox
Copy link
Collaborator Author

Optimox commented Oct 29, 2020

@athewsey interesting. Lots of information at the same time, but definitely some nice feature proposals.

Maybe you could open one separate issue for each feature (nans, loading only decoder etc...) to ease the discussion.

@Optimox Optimox force-pushed the feature/add-self-supervision branch 3 times, most recently from 2f725ba to 4f4ffff Compare November 14, 2020 22:24
chore: linting, fix variables and format

wip: pretraining notebook

WIP: pretraining almost working

feat: add self supervision
@Optimox Optimox force-pushed the feature/add-self-supervision branch from 4f4ffff to e88dc38 Compare November 14, 2020 23:57
@Optimox Optimox mentioned this pull request Nov 24, 2020
chore: fix lint

chore: update README

feat: add explain to unsupervised training

feat: update network parameters

When the network is already defined we still need to update
some parameters fed through the fit function such as
the virtual batch size and, in the case of unsupervised
pretraining, the pretraining_ratio.
@Optimox Optimox changed the title WIP feat: enable self supervised pretraining feat: enable self supervised pretraining Dec 7, 2020
@Optimox Optimox force-pushed the feature/add-self-supervision branch from 09ce9d1 to 6a2826c Compare December 7, 2020 14:19
@Optimox Optimox merged commit ebdb9ff into develop Dec 7, 2020
@Optimox Optimox deleted the feature/add-self-supervision branch December 7, 2020 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unsupervised pretraining
3 participants