-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tried to visualize the CLS Token embeddings after fine-tuning on SST-2 using t-SNE, but no clear clustered visualizations of positive and negative sentences ! #625
Comments
Hi @rsc90, I would rather use the output of Bert's |
Hello @thomwolf , Ok. could you let me know how to collect this pooled output for sentence representations after finetuning ? best, |
You can try to initialize a Just do Still not sure what you are trying to do here in general by the way. |
Yeah idid the same, like:
Intention of the experiment is, if CLS tokens were carrying representations of sentence on downstream tasks, then i was expecting something like representations what we get when we plot MNSIT data using t-SNE. (I just want to make sure whether CLS is carrying the complete sentence representation after finetuning on downstream task, if so then why i am not getting separate clusters ) Please correct me, if i am missing something or do you suggest some other experiments to verify my thoughts. Many thanks, |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Any updates on this issue? |
* Add support for DBRX * sync layers with dbrx-instruct * enable for min validated 4.38.2 * skip router.layer: out_features/32 != 0 * enable outfeatures padding for exllama v1 * re-enable router.layer * padding never used? * use hack of model * format * test marlin padding * fix moe layers * wrong var * base-converted-v2 model has Wqkv split into q,k,v * fix slow quant/packing * add note about 3 layers may be removed from quant pending test * remove norm_1/2 + router layers from quant * sync .quantize() wraning as pr huggingface#625 * compat gptqmodel * dbrx requires true_sequential = False * remove bad merge * revert marlin change * we only support dbrx_converted * add dbrx notes * Update dbrx.py * Update base.py * Update base.py * Update dbrx_converted.py * check require_trust_remote_code and trust_remote_code, if not, raise ValueError --------- Co-authored-by: LaaZa <jesse.laatu@gmail.com> Co-authored-by: diegomontoya <xing@fictionpress.com> Co-authored-by: LRL-ModelCloud <lrl@modelcloud.ai> Co-authored-by: Qubitium-ModelCloud <qubitium@modelcloud.ai>
I have used run_classifier.py to finetune the model on SST-2 data, and used this model in the extract_features.py to extract the embeddings of some sentences(fed only sentences-input.txt). Later used these features from .jsonl file and used the vectors of layer -2, corresponding to CLS token and tried to visualize using t-SNE to some see clear separations between the positive and negative sentences. But i could not get any clear clusters.
so, my questions are:
que.1: Does CLS tokens after finetuning represents the entire sentence ? , so that one can use them on downstream tasks .
que.2: What is the best way to know that, the CLS tokens after fine-tuning is carrying the sentence representation ? (For Example: I tried to visualize using t-SNE)
que.3: I used those CLS tokens vectors in scikit-learn (Naive-bayes) models as well, but i got the accuracy of around 50%, but BERT uses same vectors in the evaluation and achieves 93% accuracy , how does it possible ? Is my approach in checking the CLS token vectors is wrong ?
The following figure shows the visualizations of CLS vectors using t-SNE along with corresponding labels of the sentences (vectors from -2 layer are used for plot)
It would be great if @thomwolf have a look at this issue too.
looking forward for suggestions from all folks around here!
best,
Tinya
The text was updated successfully, but these errors were encountered: