Tried to visualize the CLS Token embeddings after fine-tuning on SST-2 using t-SNE, but no clear clustered visualizations of positive and negative sentences ! #625

rsc90 · 2019-05-20T16:20:45Z

I have used run_classifier.py to finetune the model on SST-2 data, and used this model in the extract_features.py to extract the embeddings of some sentences(fed only sentences-input.txt). Later used these features from .jsonl file and used the vectors of layer -2, corresponding to CLS token and tried to visualize using t-SNE to some see clear separations between the positive and negative sentences. But i could not get any clear clusters.
so, my questions are:
que.1: Does CLS tokens after finetuning represents the entire sentence ? , so that one can use them on downstream tasks .

que.2: What is the best way to know that, the CLS tokens after fine-tuning is carrying the sentence representation ? (For Example: I tried to visualize using t-SNE)

que.3: I used those CLS tokens vectors in scikit-learn (Naive-bayes) models as well, but i got the accuracy of around 50%, but BERT uses same vectors in the evaluation and achieves 93% accuracy , how does it possible ? Is my approach in checking the CLS token vectors is wrong ?

The following figure shows the visualizations of CLS vectors using t-SNE along with corresponding labels of the sentences (vectors from -2 layer are used for plot)

It would be great if @thomwolf have a look at this issue too.

looking forward for suggestions from all folks around here!
best,
Tinya

thomwolf · 2019-05-21T08:57:33Z

Hi @rsc90,
The BertForSequenceClassification model use a linear layer on top of Bert's pooled_output which is a small feed-forward layer with a tanh activation.

I would rather use the output of Bert's pooled_output or the last hidden-layer for what you are doing. Why do you use layer -2?

rsc90 · 2019-05-21T09:17:30Z

Hello @thomwolf ,
Thank you. I had done with last layer as well but even the clusters were not clear as shown in below fig.
I had read that last layer would be sometimes biased so i didn't, but well i experimented that as well.

Ok. could you let me know how to collect this pooled output for sentence representations after finetuning ?

best,
tinya

thomwolf · 2019-05-21T09:25:08Z

You can try to initialize a BertModel from your fine-tuned BertForSequenceClassification model (I hope you fine-tuned a model otherwise it's normal the representations are not adapted to your task).

Just do model = BertModel.from_pretrained('path-to-your-fine-tuned-model').
And then use the pooled output of your BertModel.

Still not sure what you are trying to do here in general by the way.

rsc90 · 2019-05-21T09:39:29Z

Yeah idid the same, like:

I used the run_classifier.py on SST-2 dataset, saved the model**( fine_tuned_model)**
Used fine_tuned_model in extract_features.py and collected this output.jsonl (as you said here )
From json file i plot the vectors corresponding to CLS embeddings using t-SNE

Intention of the experiment is, if CLS tokens were carrying representations of sentence on downstream tasks, then i was expecting something like representations what we get when we plot MNSIT data using t-SNE. (I just want to make sure whether CLS is carrying the complete sentence representation after finetuning on downstream task, if so then why i am not getting separate clusters )

Please correct me, if i am missing something or do you suggest some other experiments to verify my thoughts.

Many thanks,
tinya

stale · 2019-07-20T14:49:50Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

gcunhase · 2020-11-28T02:41:01Z

Any updates on this issue?

* Add support for DBRX * sync layers with dbrx-instruct * enable for min validated 4.38.2 * skip router.layer: out_features/32 != 0 * enable outfeatures padding for exllama v1 * re-enable router.layer * padding never used? * use hack of model * format * test marlin padding * fix moe layers * wrong var * base-converted-v2 model has Wqkv split into q,k,v * fix slow quant/packing * add note about 3 layers may be removed from quant pending test * remove norm_1/2 + router layers from quant * sync .quantize() wraning as pr huggingface#625 * compat gptqmodel * dbrx requires true_sequential = False * remove bad merge * revert marlin change * we only support dbrx_converted * add dbrx notes * Update dbrx.py * Update base.py * Update base.py * Update dbrx_converted.py * check require_trust_remote_code and trust_remote_code, if not, raise ValueError --------- Co-authored-by: LaaZa <jesse.laatu@gmail.com> Co-authored-by: diegomontoya <xing@fictionpress.com> Co-authored-by: LRL-ModelCloud <lrl@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>

stale bot added the wontfix label Jul 20, 2019

stale bot closed this as completed Jul 27, 2019

PiercarloSlavazza pushed a commit to PiercarloSlavazza/transformers that referenced this issue Feb 22, 2022

Prepare for python v0.10.1 (huggingface#625)

bc8bbf6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tried to visualize the CLS Token embeddings after fine-tuning on SST-2 using t-SNE, but no clear clustered visualizations of positive and negative sentences ! #625

Tried to visualize the CLS Token embeddings after fine-tuning on SST-2 using t-SNE, but no clear clustered visualizations of positive and negative sentences ! #625

rsc90 commented May 20, 2019

thomwolf commented May 21, 2019

rsc90 commented May 21, 2019

thomwolf commented May 21, 2019

rsc90 commented May 21, 2019 •

edited

Loading

stale bot commented Jul 20, 2019

gcunhase commented Nov 28, 2020

Tried to visualize the CLS Token embeddings after fine-tuning on SST-2 using t-SNE, but no clear clustered visualizations of positive and negative sentences ! #625

Tried to visualize the CLS Token embeddings after fine-tuning on SST-2 using t-SNE, but no clear clustered visualizations of positive and negative sentences ! #625

Comments

rsc90 commented May 20, 2019

thomwolf commented May 21, 2019

rsc90 commented May 21, 2019

thomwolf commented May 21, 2019

rsc90 commented May 21, 2019 • edited Loading

stale bot commented Jul 20, 2019

gcunhase commented Nov 28, 2020

rsc90 commented May 21, 2019 •

edited

Loading