feat: enable self supervised pretraining #220

Optimox · 2020-10-27T20:03:59Z

What kind of change does this PR introduce?
Implements self supervised pretraining
Does this PR introduce a breaking change?

I would say so
What needs to be documented once your changes are merged?
Some documentation

Closing issues

Closes #187 #232

athewsey · 2020-10-28T04:54:09Z

Hi again!

Big fan of:

Separating TabNetEncoder from the decisioning/decoder logic
Nice clean-diff-oriented/JS-like indenting, trailing commas, and double quotes in updated code style... even if it makes merging a bit of a pain 😂
Thanks for starting on this, as I'd been thinking about it myself recently too and am keen to see it implemented!

I'd suggest:

Bringing pre-training closer together with the supervised workflow, even if that means breaking API changes & a v3 milestone
Killing TabNetNoEmbeddings abstraction layer: We have EmbeddingGenerator, TabNetEncoder, TabNetDecoder, and each supervised task output just appears to be an FC layer which may not be worthy of its own class.
Thinking about whether there's an opportunity here to make every model share the multi-task base
Directly using NaN/None as the pre-embedding mask value for all fields, and by that method support missing values in input data

For me the ideal workflow here is that saved models will have trained embedding + encoder; may have a decoder; and may have one or more output/decision modules - depending on whether they've been pre-trained & what supervised task(s) they've been trained on.

For example we could use a ModuleDict to store multiple named decision modules, and explicitly set up the API so that when you .fit() on one or multi supervised task(s) you give the task(s) name(s). Users could explicitly fine-tune existing supervised task(s) (or pre-training) with new data; cross-train the encoder & embeddings to a new task; and so on.

I would like to see the training procedure (e.g. *Model) use None/NaN as the mask value so it looks the same as actual missing data, and have missing-value featurization done (configurably) in EmbeddingGenerator: as I think this could help simplify use of the network for missing value imputation. For e.g. I've been playing with a rough draft on athewsey/feat/tra in which EmbeddingGenerator takes a nonfinite_treatment parameter which can specify either globally or per-feature to either:

Ignore non-finite values
Mask them with a particular value (e.g. 0 per the original paper, -1 or something out of range for that field, or whatever)
...Or add an extra 0/1 is_missing dimension to that feature's embedding, and mask other column(s) to 0 when missing

In a small preliminary test (on Forest Cover Type again 😂 - just masking some values at random and doing supervised classification task), I actually found that this extra column featurization delivered best results despite significantly increasing the post_embed_dim... Which I took as A) TabNet is good at learning to attend to columns already and B) I was using embedding-aware attention, so this treatment didn't actually increase the dimensionality of the mask: Just the FeatureTransformer inputs. 0 filling performed worst, and out-of-range value filling was between the two.

Thanks again & LMK your thoughts!

Optimox · 2020-10-29T15:10:34Z

@athewsey interesting. Lots of information at the same time, but definitely some nice feature proposals.

Maybe you could open one separate issue for each feature (nans, loading only decoder etc...) to ease the discussion.

chore: linting, fix variables and format wip: pretraining notebook WIP: pretraining almost working feat: add self supervision

chore: fix lint chore: update README feat: add explain to unsupervised training feat: update network parameters When the network is already defined we still need to update some parameters fed through the fit function such as the virtual batch size and, in the case of unsupervised pretraining, the pretraining_ratio.

Optimox mentioned this pull request Oct 27, 2020

Unsupervised pretraining #187

Closed

kkontoudi mentioned this pull request Oct 27, 2020

WIP feat/self supervised pretraining #221

Closed

Optimox mentioned this pull request Nov 12, 2020

Function 'PowBackward0' returned nan values in its 0th output. #232

Closed

Optimox force-pushed the feature/add-self-supervision branch 3 times, most recently from 2f725ba to 4f4ffff Compare November 14, 2020 22:24

feat: enable self supervised pretraining

e88dc38

chore: linting, fix variables and format wip: pretraining notebook WIP: pretraining almost working feat: add self supervision

Optimox force-pushed the feature/add-self-supervision branch from 4f4ffff to e88dc38 Compare November 14, 2020 23:57

Optimox mentioned this pull request Nov 24, 2020

How to pretrain models? #178

Closed

Optimox changed the title ~~WIP feat: enable self supervised pretraining~~ feat: enable self supervised pretraining Dec 7, 2020

chore: update README

6a2826c

Optimox force-pushed the feature/add-self-supervision branch from 09ce9d1 to 6a2826c Compare December 7, 2020 14:19

Optimox merged commit ebdb9ff into develop Dec 7, 2020

Optimox deleted the feature/add-self-supervision branch December 7, 2020 14:33

athewsey mentioned this pull request Apr 5, 2021

Research(?) : Alternative missing-value masks #278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable self supervised pretraining #220

feat: enable self supervised pretraining #220

Optimox commented Oct 27, 2020 •

edited

Loading

athewsey commented Oct 28, 2020

Optimox commented Oct 29, 2020

feat: enable self supervised pretraining #220

feat: enable self supervised pretraining #220

Conversation

Optimox commented Oct 27, 2020 • edited Loading

athewsey commented Oct 28, 2020

Optimox commented Oct 29, 2020

Optimox commented Oct 27, 2020 •

edited

Loading