Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About TransformerWordEmbeddings class #2713

Closed
matirojasg opened this issue Apr 7, 2022 · 5 comments
Closed

About TransformerWordEmbeddings class #2713

matirojasg opened this issue Apr 7, 2022 · 5 comments
Labels
question Further information is requested wontfix This will not be worked on

Comments

@matirojasg
Copy link

Hi, I am using the following line of code to create contextualized embeddings from a biomedical roberta model.

TransformerWordEmbeddings("PlanTL-GOB-ES/roberta-base-biomedical-clinical-es")

However, when executing the code, I get the following error:

File "/home/x/work/clinical-nested-ner-mlc/venv/lib/python3.8/site-packages/flair/embeddings/base.py", line 652, in _extract_token_embeddings
   assert subword_start_idx < subword_end_idx <= sentence_hidden_state.size()[1]
AssertionError

This does not occur using other hugging face library models, what could be the error in the model? Here is the link to the repository:

https://huggingface.co/PlanTL-GOB-ES/roberta-base-biomedical-clinical-es

Thanks

@matirojasg matirojasg added the question Further information is requested label Apr 7, 2022
@matirojasg
Copy link
Author

@helpmefindaname
Copy link
Collaborator

Hey @matirojasg thank you for reporting, @alanakbik as this might be due to my refactoring, I'll look into it soon

@alanakbik
Copy link
Collaborator

Awesome, thanks @helpmefindaname!

helpmefindaname added a commit to helpmefindaname/flair that referenced this issue Apr 8, 2022
alanakbik added a commit that referenced this issue Apr 10, 2022
…ulation

GH-2713: make transformer offset calculation more robust
@matirojasg
Copy link
Author

Thank you guys!

patrickjae added a commit to showheroes/flair that referenced this issue May 18, 2022
* flairNLPGH-2632: Revert "Removes hyperparameter features"

This reverts commit 9aff426.

* flairNLPGH-2632: Updating the param selection docs for the v0.10 syntax

* flairNLPGH-2632: Adding hyperopt back to requirements.txt

* flairNLPGH-2632: Fixing paramselection code to work with changes in Flair v0.10

* flairNLPGH-2632: Fixing bug where embeddings got added twice on multiple training runs

* flairNLPGH-2632: Enabling and fixing tests for param selection

* flairNLPGH-2632: Fixing flake, mypy and isort issues

* Dropout for all

* fix first_last

* Fix printouts for SequenceTagger

* 🐛 Fix .pre-commit-config.yaml

While trying to set up pre-commit, I got an indentation error.
Moreover, pycqa/isort does not have a stable rev. I set it to the most recent release tag.

* feat: ✨ initial implementation of JsonlCorpora and Datasets

* flairNLPGH-2654: Fixed printing and logging inconsistencies.

* Adding TransformerDocumentEmbeddings support to TextClassifierParamSelector and applying PR suggestions

* Fixing flake tests

* Using a small transformer in tests to reduce the CI agent memory usage

* Fix find_learning_rate

* Updating korean docs

* removing warining from step()

* fix: patch the missing `document_delmiter` for `lm.__get_state__()`

* updated broken link

* flairNLPGH-2654: Added review comments made by @Weyaaron

* flairNLPGH-2654: Fix breaking gzip import

* Making fune_tune a normal (non-tunable) parameter and defaulting it to True

* refactor: pin pytest in pipfile

* refactor: ♻️ make label_type configurable for Jsonl corpora

* fix: pin isort correctly to major release 5

* refactor: pin isort in pipfile to major release 5

* Fix relation extractor

* datasets: add support for HIPE 2022

* datasets: register NER_HIPE_2022

* tests: add extensive test cases for all sub-datasets for HIPE 2022

* Set default dropouts to 0 for equality to previous approaches

* datasets: fix flake8 errors for HIPE 2022 integration

* Update flair/models/language_model.py

Co-authored-by: Tadej Magajna <tmagajna@gmail.com>

* Formatting

* datasets: add support for v2 of HIPE-2022 dataset

* tests: update cases for v2 of HIPE-2022 dataset

* tests: minor flake fix for datasets

* tests: adjust latest HIPE v2.0 data changes for SONAR and NewsEye dataset

* datasets: switch to main as default branch name for HIPE-2022 data repo

* datasets: introduce some bug fixes for HIPE-2022 (tab as delimiter, ignore empty tokens)

* test: include label checking tests for HIPE-2022

* datasets: beautify emtpy token fix for HIPE-2022 dataset reader

* tests: fix mypy error (hopefully)

* datasets: fix mypy error (hopefully)

* flairNLPGH-2689: bump version numbers

* different way to exclude labels

* remove comment

* Change printouts for all DataPoints and Labels

* Black formatting

* Update printouts

* Update printouts to round confidence scores

* Add Arabic NER models back in

* Update readmes for new label logic and printouts

* Make DataPoint hashable and add test

* Do not add O tags

* Remap relation labels

* minor formatting

* Changed the documentation on OneHotEmbeddings to reflect changes in the master version: OneHotEmbeddings.from_corpus() instead of OneHotEmbeddings().

* Nicer printouts for make_label_dictionary

* Update documentation

* Black formatting

* small fixes

* Global arrow symbol

* Global arrow symbol

* Update relation model

* Fix unit test

* Fix unit tests

* datasets import

* Update documentation

* Update TUTORIAL_7_TRAINING_A_MODEL.md

* datasets: add possibility to use custom preprocessing function for HIPE-2022

* datasets: fix mypy error for HIPE-2022 preprocessing function

* datasets: revert self from HIPE-2022 preprocessing fn

* datasets: fix preprocessing function handling in HIPE-2022

* Minor fixes for tutorials

* Fix the SciSpacyTokenizer.tokenize() bug.

Makes sure the words are added to the correct list variable and that strings, not SpaCy Token objects, are returned.

* Fixing Hunflair docs that depended on SciSpacyTokenizer

* flairNLPGH-2713: make transformer offset calculation more robust

* flairNLPGH-2717: add option to ignore labels to ColumnCorpus

* flairNLPGH-2717: formatting

* flairNLPGH-2689: bump version numbers to 0.11.1

* flairNLPGH-2720: handle consecutive whitespaces

* add exclude labels parameter to trainer.train and minor change in AIDA corpus

* minor formatting

* minor formating

* Remove unnecessary more-itertools pin

The dependency and the pin were added in https://github.com/flairNLP/flair/pull/2312/files. more-itertools is a pretty stable library.

* fix wrong initialisations of label (where data_type was missing) and reintroduce working version of "return_probabilities_for_all_classes" for sequence tagger

* datasets: add support for version 2.1 of HIPE-2022

* added missing property decorator

* add encoding=utf-8 to file handles in NER_HIPE_2022 corpus

* minor formatting

* flairNLPGH-2728: add option to force token-level predictions

* Move files to fix unit tests

* Adapt dataset name depending on whether use_ids_and_check_existence is set

* Fix unit tests for GERMEVAL dataset rename

* Ignore deviation in signature in mypy

* Black formattin

* Extend span detection logic

* flairNLPGH-2722: make span detection more robust

* Add missing data

* cache models used in testing to speed up tests

* create cache folder if it doesn't exist

* set cache to local folder

* don't create redundant cache prefix

* fix mypy error

* dummy commit to see how fast tests run with caching

* don't force creation of cache folder (as it should be created whenever needed anyways)

* flairNLPGH-2754: bump version numbers

* Update gdown requirement

Advance gdown to latest release.

* flairNLPGH-2763: remove legacy TransformerXLEmbeddings class

* flairNLPGH-2765: Test with Python 3.7

* fix unit tests

* flairNLPGH-2770: bump version numbers

Co-authored-by: Tadej Magajna <tmagajna@gmail.com>
Co-authored-by: Alan Akbik <alan.akbik@gmail.com>
Co-authored-by: AnotherStranger <AnotherStranger@users.noreply.github.com>
Co-authored-by: Xabier Lahuerta Vázquez <xlahuerta@protonmail.com>
Co-authored-by: Mike Tian-Jian Jiang <tmjiang@gmail.com>
Co-authored-by: Rishivant Singh <rishivant.singh@knoldus.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Marcel <marcelmilch@gmx.de>
Co-authored-by: j <9658618+stw2@users.noreply.github.com>
Co-authored-by: Benedikt Fuchs <e1526472@student.tuwien.ac.at>
Co-authored-by: mauryaland <amaury@fouret.org>
Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: susannaruecker <susanna.ruecker@hu-berlin.de>
Co-authored-by: upgradvisor-bot <92053865+upgradvisor-bot@users.noreply.github.com>
@stale
Copy link

stale bot commented Aug 13, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Aug 13, 2022
@stale stale bot closed this as completed Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants