-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Diacritization dataset/task/asset #128
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Use undiacritized tokens as fall back for None results.
Closed
fdalvi
added a commit
that referenced
this pull request
Jul 23, 2023
* Add diacritizaton module * Update ArabicDiacritization.py Use undiacritized tokens as fall back for None results. * Format code * Add comments and minor fixes * More fixes to dataloader --------- Co-authored-by: Ahmed Abdelali <ahmed.abdelali@gmail.com>
fdalvi
added a commit
that referenced
this pull request
Aug 7, 2023
* updated prompt and postprocessing * updated factuality_disinformation_harmful_content/Adult * Improve Segmentation evaluation and add GPT4 asset (#39) * Add ArabicSequenceTagging * push Segmentation task * Add segmentation changes * push segmentation and merge changes * Fix evaluation * change None data to unsegmented words * fix none segmentation, re-format code * fix segmentation except * fix changes with upstream * Format code * Update segmentation task and assets from feat/POS branch * Format code * Fix evaluation and GPT4 asset * Add latest assets from Ahmed * Fix test for multi-config * Add code to remove extra spaces in assets --------- Co-authored-by: Fahim Imaduddin Dalvi <faimaduddin@hbku.edu.qa> * Add Diacritization dataset/task/asset (#128) * Add diacritizaton module * Update ArabicDiacritization.py Use undiacritized tokens as fall back for None results. * Format code * Add comments and minor fixes * More fixes to dataloader --------- Co-authored-by: Ahmed Abdelali <ahmed.abdelali@gmail.com> * Add fewshot assets for QA tasks (#118) * Added few shot learning script for Arcd dataset * Added few shot asset for ARCD data * Added few shot script for MLQA * added few shot script for TydiQA * added few shot script for XQUAD * Format code * Fix zeroshot assets to not mask prediction failure * Remove hardcoded engine names * Fix MLQA paths --------- Co-authored-by: Fahim Imaduddin Dalvi <faimaduddin@hbku.edu.qa> * Add GPT4 zeroshot assets QA tasks (#129) * Added scripts for QA ZS * Format code * Save input along with fewshot samples * Fix MLQA path --------- Co-authored-by: Fahim Imaduddin Dalvi <faimaduddin@hbku.edu.qa> * Fix NER data loaders (#131) Reset sentence/labels tokens correctly. * Reorganize assets, add Hatespeech fewshot and STS BLOOM assets (#117) * Added BLOOOMZ implementation for STS Track 1 * changed the post processing sts_bloomz * changed the prompt. * edited the prompt * changed the rating scale from 0-5 to 0-10 and divided the output by 2 * code formatting * Remove dead code * Added FS implementation for Hate Speech, to replace previous implementation * Update data paths in STSTrack1 asset for BLOOM * Format code * Update data path in HS asset * Update Adult assets hierarchy and data paths --------- Co-authored-by: maramhasanain <maramhasanain@gmail.com> Co-authored-by: Fahim Imaduddin Dalvi <faimaduddin@hbku.edu.qa> * Improve BLOOM asset for Offensive task (#120) Co-authored-by: sabdaljalil <sabdaljalil@hbku.edu.qa> Co-authored-by: Fahim Imaduddin Dalvi <faimaduddin@hbku.edu.qa> * Add multiconfig assets for AraBench (#132) * Add MT GPT4 with multi-configs * Make GPT3 asset multi-config * Format code * Fix prompt function content to match ChatCompletion API --------- Co-authored-by: Ahmed Abdelali <ahmed.abdelali@gmail.com> Co-authored-by: Ahmed Abdelali <aabdelali@hbku.edu.qa> * Add DialectADI dataset and assets (#122) * DialectADI task added, updated init file * Reorganize older DialectID assets * Revert task to DialectID --------- Co-authored-by: Fahim Imaduddin Dalvi <faimaduddin@hbku.edu.qa> * Update citation for Adult task (#123) Co-authored-by: Fahim Imaduddin Dalvi <faimaduddin@hbku.edu.qa> * updated Adult_BLOOMZ --------- Co-authored-by: Ahmed Abdelali <ahmed.abdelali@gmail.com> Co-authored-by: Fahim Imaduddin Dalvi <faimaduddin@hbku.edu.qa> Co-authored-by: Basel Mousi <59998313+baselmousi@users.noreply.github.com> Co-authored-by: Sabri Boughorbel <bsabri@yahoo.com> Co-authored-by: maramhasanain <maramhasanain@gmail.com> Co-authored-by: sabdaljalil2000 <112805770+sabdaljalil2000@users.noreply.github.com> Co-authored-by: sabdaljalil <sabdaljalil@hbku.edu.qa> Co-authored-by: Ahmed Abdelali <aabdelali@hbku.edu.qa>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.