About TransformerWordEmbeddings class #2713

matirojasg · 2022-04-07T19:55:54Z

Hi, I am using the following line of code to create contextualized embeddings from a biomedical roberta model.

TransformerWordEmbeddings("PlanTL-GOB-ES/roberta-base-biomedical-clinical-es")

However, when executing the code, I get the following error:

File "/home/x/work/clinical-nested-ner-mlc/venv/lib/python3.8/site-packages/flair/embeddings/base.py", line 652, in _extract_token_embeddings
   assert subword_start_idx < subword_end_idx <= sentence_hidden_state.size()[1]
AssertionError

This does not occur using other hugging face library models, what could be the error in the model? Here is the link to the repository:

https://huggingface.co/PlanTL-GOB-ES/roberta-base-biomedical-clinical-es

Thanks

The text was updated successfully, but these errors were encountered:

matirojasg · 2022-04-07T21:55:11Z

https://colab.research.google.com/drive/1Wl4eh26L00MJtSqOfgRbKiqkG3E5X1Oh?usp=sharing

helpmefindaname · 2022-04-07T22:47:33Z

Hey @matirojasg thank you for reporting, @alanakbik as this might be due to my refactoring, I'll look into it soon

alanakbik · 2022-04-08T07:52:45Z

Awesome, thanks @helpmefindaname!

…ulation GH-2713: make transformer offset calculation more robust

matirojasg · 2022-04-14T15:20:21Z

Thank you guys!

@Weyaaron

* flairNLPGH-2632: Revert "Removes hyperparameter features" This reverts commit 9aff426. * flairNLPGH-2632: Updating the param selection docs for the v0.10 syntax * flairNLPGH-2632: Adding hyperopt back to requirements.txt * flairNLPGH-2632: Fixing paramselection code to work with changes in Flair v0.10 * flairNLPGH-2632: Fixing bug where embeddings got added twice on multiple training runs * flairNLPGH-2632: Enabling and fixing tests for param selection * flairNLPGH-2632: Fixing flake, mypy and isort issues * Dropout for all * fix first_last * Fix printouts for SequenceTagger * 🐛 Fix .pre-commit-config.yaml While trying to set up pre-commit, I got an indentation error. Moreover, pycqa/isort does not have a stable rev. I set it to the most recent release tag. * feat: ✨ initial implementation of JsonlCorpora and Datasets * flairNLPGH-2654: Fixed printing and logging inconsistencies. * Adding TransformerDocumentEmbeddings support to TextClassifierParamSelector and applying PR suggestions * Fixing flake tests * Using a small transformer in tests to reduce the CI agent memory usage * Fix find_learning_rate * Updating korean docs * removing warining from step() * fix: patch the missing `document_delmiter` for `lm.__get_state__()` * updated broken link * flairNLPGH-2654: Added review comments made by @Weyaaron * flairNLPGH-2654: Fix breaking gzip import * Making fune_tune a normal (non-tunable) parameter and defaulting it to True * refactor: pin pytest in pipfile * refactor: ♻️ make label_type configurable for Jsonl corpora * fix: pin isort correctly to major release 5 * refactor: pin isort in pipfile to major release 5 * Fix relation extractor * datasets: add support for HIPE 2022 * datasets: register NER_HIPE_2022 * tests: add extensive test cases for all sub-datasets for HIPE 2022 * Set default dropouts to 0 for equality to previous approaches * datasets: fix flake8 errors for HIPE 2022 integration * Update flair/models/language_model.py Co-authored-by: Tadej Magajna <tmagajna@gmail.com> * Formatting * datasets: add support for v2 of HIPE-2022 dataset * tests: update cases for v2 of HIPE-2022 dataset * tests: minor flake fix for datasets * tests: adjust latest HIPE v2.0 data changes for SONAR and NewsEye dataset * datasets: switch to main as default branch name for HIPE-2022 data repo * datasets: introduce some bug fixes for HIPE-2022 (tab as delimiter, ignore empty tokens) * test: include label checking tests for HIPE-2022 * datasets: beautify emtpy token fix for HIPE-2022 dataset reader * tests: fix mypy error (hopefully) * datasets: fix mypy error (hopefully) * flairNLPGH-2689: bump version numbers * different way to exclude labels * remove comment * Change printouts for all DataPoints and Labels * Black formatting * Update printouts * Update printouts to round confidence scores * Add Arabic NER models back in * Update readmes for new label logic and printouts * Make DataPoint hashable and add test * Do not add O tags * Remap relation labels * minor formatting * Changed the documentation on OneHotEmbeddings to reflect changes in the master version: OneHotEmbeddings.from_corpus() instead of OneHotEmbeddings(). * Nicer printouts for make_label_dictionary * Update documentation * Black formatting * small fixes * Global arrow symbol * Global arrow symbol * Update relation model * Fix unit test * Fix unit tests * datasets import * Update documentation * Update TUTORIAL_7_TRAINING_A_MODEL.md * datasets: add possibility to use custom preprocessing function for HIPE-2022 * datasets: fix mypy error for HIPE-2022 preprocessing function * datasets: revert self from HIPE-2022 preprocessing fn * datasets: fix preprocessing function handling in HIPE-2022 * Minor fixes for tutorials * Fix the SciSpacyTokenizer.tokenize() bug. Makes sure the words are added to the correct list variable and that strings, not SpaCy Token objects, are returned. * Fixing Hunflair docs that depended on SciSpacyTokenizer * flairNLPGH-2713: make transformer offset calculation more robust * flairNLPGH-2717: add option to ignore labels to ColumnCorpus * flairNLPGH-2717: formatting * flairNLPGH-2689: bump version numbers to 0.11.1 * flairNLPGH-2720: handle consecutive whitespaces * add exclude labels parameter to trainer.train and minor change in AIDA corpus * minor formatting * minor formating * Remove unnecessary more-itertools pin The dependency and the pin were added in https://github.com/flairNLP/flair/pull/2312/files. more-itertools is a pretty stable library. * fix wrong initialisations of label (where data_type was missing) and reintroduce working version of "return_probabilities_for_all_classes" for sequence tagger * datasets: add support for version 2.1 of HIPE-2022 * added missing property decorator * add encoding=utf-8 to file handles in NER_HIPE_2022 corpus * minor formatting * flairNLPGH-2728: add option to force token-level predictions * Move files to fix unit tests * Adapt dataset name depending on whether use_ids_and_check_existence is set * Fix unit tests for GERMEVAL dataset rename * Ignore deviation in signature in mypy * Black formattin * Extend span detection logic * flairNLPGH-2722: make span detection more robust * Add missing data * cache models used in testing to speed up tests * create cache folder if it doesn't exist * set cache to local folder * don't create redundant cache prefix * fix mypy error * dummy commit to see how fast tests run with caching * don't force creation of cache folder (as it should be created whenever needed anyways) * flairNLPGH-2754: bump version numbers * Update gdown requirement Advance gdown to latest release. * flairNLPGH-2763: remove legacy TransformerXLEmbeddings class * flairNLPGH-2765: Test with Python 3.7 * fix unit tests * flairNLPGH-2770: bump version numbers Co-authored-by: Tadej Magajna <tmagajna@gmail.com> Co-authored-by: Alan Akbik <alan.akbik@gmail.com> Co-authored-by: AnotherStranger <AnotherStranger@users.noreply.github.com> Co-authored-by: Xabier Lahuerta Vázquez <xlahuerta@protonmail.com> Co-authored-by: Mike Tian-Jian Jiang <tmjiang@gmail.com> Co-authored-by: Rishivant Singh <rishivant.singh@knoldus.com> Co-authored-by: Stefan Schweter <stefan@schweter.it> Co-authored-by: Marcel <marcelmilch@gmx.de> Co-authored-by: j <9658618+stw2@users.noreply.github.com> Co-authored-by: Benedikt Fuchs <e1526472@student.tuwien.ac.at> Co-authored-by: mauryaland <amaury@fouret.org> Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: susannaruecker <susanna.ruecker@hu-berlin.de> Co-authored-by: upgradvisor-bot <92053865+upgradvisor-bot@users.noreply.github.com>

stale · 2022-08-13T04:35:32Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

matirojasg added the question Further information is requested label Apr 7, 2022

helpmefindaname added a commit to helpmefindaname/flair that referenced this issue Apr 8, 2022

flairNLPGH-2713: make transformer offset calculation more robust

667c2cb

helpmefindaname mentioned this issue Apr 8, 2022

GH-2713: make transformer offset calculation more robust #2714

Merged

alanakbik added a commit that referenced this issue Apr 10, 2022

Merge pull request #2714 from helpmefindaname/GH-2713/fix_offset_calc…

fe3fe1f

…ulation GH-2713: make transformer offset calculation more robust

stale bot added the wontfix This will not be worked on label Aug 13, 2022

stale bot closed this as completed Sep 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About TransformerWordEmbeddings class #2713

About TransformerWordEmbeddings class #2713

matirojasg commented Apr 7, 2022

matirojasg commented Apr 7, 2022

helpmefindaname commented Apr 7, 2022

alanakbik commented Apr 8, 2022

matirojasg commented Apr 14, 2022

stale bot commented Aug 13, 2022

About TransformerWordEmbeddings class #2713

About TransformerWordEmbeddings class #2713

Comments

matirojasg commented Apr 7, 2022

matirojasg commented Apr 7, 2022

helpmefindaname commented Apr 7, 2022

alanakbik commented Apr 8, 2022

matirojasg commented Apr 14, 2022

stale bot commented Aug 13, 2022