-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FAQ and Troubleshooting for PyABSA [使用方法和常见问题] #189
Comments
Hi there! I am analyzing tweets about immigration and trying to find the sentiment associated with the tweet, whether positive or negative. However, I was running into some issues with the word "immigrants" being used in a sentence. For example, I plugged in the sentence "I think that illegal immigrants are detrimental to U.S. society" to the online aspect-based sentiment analysis (https://huggingface.co/spaces/Gradio-Blocks/Multilingual-Aspect-Based-Sentiment-Analysis), and did not get any results for a sentiment associated with "immigrants", when it should be negative. Here is a screenshot of what it looks like on my end. However, when changing the sentence to "I think that illegal ice cream parlors are detrimental to U.S. society", there is a negative sentiment associated with "ice cream" with a 0.9999 confidence level. Here is a screenshot of this test. I was wondering why this was happening and if there is a way to make it such that the word "immigrants" can be associated with a sentiment. Thanks! |
The result is highly dependent on the training data. Although our dataset contains update 60K ABSA training data, which is much more than other repos, the immigrant or related corpus are not included. So it means you need to collect and annoate some data(2K+ examples are necessary) and train our models, you can find the training script in the demo folder. And you can annoate our dataset via this tool: https://github.com/yangheng95/ABSADatasets/tree/v1.2/DPT |
Which taks do you need to do, APC, ATEPC or ASTE? See the demo https://huggingface.co/spaces/yangheng/PyABSA for details. |
I want to do ATEPC. I understand that with this task I can extract 1 or multiple aspect terms and determine their sentiments. Thanks for your help! |
***PS: the column "aspect categories" of my train dataset are not a must. I just added them because I might use them later on. |
First convert you data to APC format as following (You need to write the code yourself): Then, convert it to ATEPC dataset: convert_apc_ |
Okay I see, thank you for those hints! I will try that, however I was wondering the 2 points: 1-So is it true that this way is also a "common way" of doing it/or lets say possible without too big obstacles? 2-And the 2. step to convert to ATEPC dataset is basically just using that function of yours right? |
Yes. I think it simple to code. The second step only calls a api. |
Okay Im gonna try that, thanks for your quick help! :) |
Try |
Thank you! That worked now. I only saw that 1.8k tuples raised the IgnoreError as their aspects were "NULL". Any hints how to proceed? Im not sure how to do the train/test/valid split on that ATEPC Object now. And the step Register your dataset in PyABSA afterwards is also necessary I assume? |
Please show me some examples of your annotated data to find what is wrong. It optional to register your dataset if you would like to share your dataset with the community. |
The NULL label is not supported in APC, ATEPC subtasks. However, you can try write code to convert by yourself to adapt your data to ASTE or ACOS subtasks. |
Please refer to the https://github.com/yangheng95/ABSADatasets/tree/v2.0/datasets for annotation |
Okay, and is there any function for train test splits by pyabsa? What have you used to make your train test splits? Not sure if train test split by sklearn works for that and whether I can split the 1 Atepc object I have |
I don't really get it by Atepc object. But generally you can split using sklearn |
Hello, I have a question about combining the Tasks ATE and APC: I have one model_1 that performs good in APC but bad in ATE. Is there a way to combine those 2 models to enhance performance of both tasks? Many thianks for any hints to a newbie in ML... |
Hi there, I'm trying to make ABSA using the triplets code (ASTE), but the code is giving me the following results. Can you tell me what the problem is?" code !pip install pyabsa -U results Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ [New Feature] Aspect Sentiment Triplet Extraction since v2.1.0 (https://github.com/yangheng95/PyABSA/tree/v2/examples-v2/aspect_sentiment_triplet_extration) I am using Colab with GPU. I tried versions 2.3.1, 2.0.27, 1.16.27. None of them worked. |
I am having an issue running the ATEPCTrainer. `warnings.filterwarnings("ignore") config = ( config.model = ATEPC.ATEPCModelList.FAST_LCF_ATEPC # FAST_LCF_ATEPC improved version of LCF_ATEPC, base version BERT_BASE_ATEPC config.get_atepc_config_english trainer = ATEPC.ATEPCTrainer( Thanks in advance for looking! |
same question... |
Please try 1.16.28 or latest v2. Otherwise you can clone the source code and search the load_state_dict. Add strict=False to this function. |
Here are the most asked questions and advice for troubleshooting:
About ABSADataset
We really suggest you share your dataset in ABSADatasets, which helps the community to provide better checkpoints and develop better models. The datasets are released under the author's license and only for research.
Thanks to the contributors, we have collected many ABSADatasets that are enough to train the universal checkpoints which are available now at HuggingFace Space.
We already provide a data processing tool for you to annotate your own dataset, and download it and run the page on a browser to annotate.
Meanwhile, PyABSA provides the tutorial to generate an inference set for aspect-based sentiment classification, and convert the APC datasets to ATEPC datasets.
Put your dataset in the same location of 'integrated_datasets'(run any training script to download this folder), and PyABSA auto detects your training set, test set, and valid(dev) set (if any).
You can use the path as the dataset param or keyword to locate your dataset, refer to ABSADatasets for how to use your dataset in PyABSA. If you got any problems, please report in time. Make sure your dataset is encoded using UTF-8
About Checkpoint
This is a personal project which has no hosting server support, I have to utilize the public service to distribute checkpoints, e.g., Google Drive, Baidu Netdisk, Huggingface Hub.
Generally, use available_checkpoints() can show you the available checkpoints depending on your version, and the checkpoints will be downloaded from Google Drive automatically. But if the checkpoints get donwloaded frequently, the google will disable automatic downloading function, while you can download it manually via a broswer.
If you have no access to Google Drive, please check Baidu Netdisk for available checkpoints and download manually.
About Config
The config implementations of the aspect-based sentiment classification (ABSC/ASC), the aspect-term extract & sentiment classification, and the sentence level text classification (TC) are similar, here is an example of a config setting:
About Tutorial
This repo is mainly developed and maintained by myself, and it is not the main project, so I do have not enough to prepare documentation.
As an alternative, This repo provides many tutorials in the demos folder to help you find as much as features of PyABSA. If there is anything you can't figure out, please make an issue.
About Model
Someone may want to use the best model, however it depends on the dataset. We make an simple performance table of our model on public dataset, you can compare it to other repo/tool before deciding which one to use. Generally speaking, Fast-LCF will be a good choice for all senarios.
About Task
Now, we only support aspect-based sentiment classification, aspect term extraction & sentiment classification, and sentence level text classification. You can develop your own model based on PyABSA and share with us, or introduce some new tasks into PyABSA, even just not tightly integrated.
About Documentation
No plan of writing documentation yet, if someone would do it, we may do it togther.
The text was updated successfully, but these errors were encountered: