Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tried to visualize the CLS Token embeddings after fine-tuning on SST-2 using t-SNE, but no clear clustered visualizations of positive and negative sentences ! #625

Closed
rsc90 opened this issue May 20, 2019 · 6 comments
Labels

Comments

@rsc90
Copy link

rsc90 commented May 20, 2019

I have used run_classifier.py to finetune the model on SST-2 data, and used this model in the extract_features.py to extract the embeddings of some sentences(fed only sentences-input.txt). Later used these features from .jsonl file and used the vectors of layer -2, corresponding to CLS token and tried to visualize using t-SNE to some see clear separations between the positive and negative sentences. But i could not get any clear clusters.
so, my questions are:
que.1: Does CLS tokens after finetuning represents the entire sentence ? , so that one can use them on downstream tasks .

que.2: What is the best way to know that, the CLS tokens after fine-tuning is carrying the sentence representation ? (For Example: I tried to visualize using t-SNE)

que.3: I used those CLS tokens vectors in scikit-learn (Naive-bayes) models as well, but i got the accuracy of around 50%, but BERT uses same vectors in the evaluation and achieves 93% accuracy , how does it possible ? Is my approach in checking the CLS token vectors is wrong ?

The following figure shows the visualizations of CLS vectors using t-SNE along with corresponding labels of the sentences (vectors from -2 layer are used for plot)

It would be great if @thomwolf have a look at this issue too.

looking forward for suggestions from all folks around here!
best,
Tinya

image

@thomwolf
Copy link
Member

Hi @rsc90,
The BertForSequenceClassification model use a linear layer on top of Bert's pooled_output which is a small feed-forward layer with a tanh activation.

I would rather use the output of Bert's pooled_output or the last hidden-layer for what you are doing. Why do you use layer -2?

@rsc90
Copy link
Author

rsc90 commented May 21, 2019

Hello @thomwolf ,
Thank you. I had done with last layer as well but even the clusters were not clear as shown in below fig.
I had read that last layer would be sometimes biased so i didn't, but well i experimented that as well.

image

Ok. could you let me know how to collect this pooled output for sentence representations after finetuning ?

best,
tinya

@thomwolf
Copy link
Member

You can try to initialize a BertModel from your fine-tuned BertForSequenceClassification model (I hope you fine-tuned a model otherwise it's normal the representations are not adapted to your task).

Just do model = BertModel.from_pretrained('path-to-your-fine-tuned-model').
And then use the pooled output of your BertModel.

Still not sure what you are trying to do here in general by the way.

@rsc90
Copy link
Author

rsc90 commented May 21, 2019

Yeah idid the same, like:

  1. I used the run_classifier.py on SST-2 dataset, saved the model**( fine_tuned_model)**
  2. Used fine_tuned_model in extract_features.py and collected this output.jsonl (as you said here )
  3. From json file i plot the vectors corresponding to CLS embeddings using t-SNE

Intention of the experiment is, if CLS tokens were carrying representations of sentence on downstream tasks, then i was expecting something like representations what we get when we plot MNSIT data using t-SNE. (I just want to make sure whether CLS is carrying the complete sentence representation after finetuning on downstream task, if so then why i am not getting separate clusters )

Please correct me, if i am missing something or do you suggest some other experiments to verify my thoughts.

Many thanks,
tinya

@stale
Copy link

stale bot commented Jul 20, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 20, 2019
@stale stale bot closed this as completed Jul 27, 2019
@gcunhase
Copy link

Any updates on this issue?

PiercarloSlavazza pushed a commit to PiercarloSlavazza/transformers that referenced this issue Feb 22, 2022
ZYC-ModelCloud pushed a commit to ZYC-ModelCloud/transformers that referenced this issue Nov 14, 2024
* Add support for DBRX

* sync layers with dbrx-instruct

* enable for min validated 4.38.2

* skip router.layer: out_features/32 != 0

* enable outfeatures padding for exllama v1

* re-enable router.layer

* padding never used?

* use hack of model

* format

* test marlin padding

* fix moe layers

* wrong var

* base-converted-v2 model has Wqkv split into q,k,v

* fix slow quant/packing

* add note about 3 layers may be removed from quant pending test

* remove norm_1/2 + router layers from quant

* sync .quantize() wraning as pr huggingface#625

* compat gptqmodel

* dbrx requires true_sequential = False

* remove bad merge

* revert marlin change

* we only support dbrx_converted

* add dbrx notes

* Update dbrx.py

* Update base.py

* Update base.py

* Update dbrx_converted.py

* check require_trust_remote_code and trust_remote_code, if not, raise ValueError

---------

Co-authored-by: LaaZa <jesse.laatu@gmail.com>
Co-authored-by: diegomontoya <xing@fictionpress.com>
Co-authored-by: LRL-ModelCloud <lrl@modelcloud.ai>
Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants