Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected senitment classification on idioms #2003

Closed
cpuyyp opened this issue Dec 3, 2020 · 5 comments
Closed

Unexpected senitment classification on idioms #2003

cpuyyp opened this issue Dec 3, 2020 · 5 comments
Labels
question Further information is requested

Comments

@cpuyyp
Copy link

cpuyyp commented Dec 3, 2020

Hi,

I applied the pre-trained flair sentiment classifier on idioms, such as 'Couldn't agree more', and it gives a negative with probability 0.8243. But I believe the sentence should be positive from my feeling. Is there any reason for this?

By the way, is that possible to produce a sentiment score (in range -1 and 1) using flair?

Many thanks!

@cpuyyp cpuyyp added the question Further information is requested label Dec 3, 2020
@whoisjones
Copy link
Member

Indeed interesting - sentiments change depending on punctuation in my case so it might be due to tokenization:
Sentence: "couldnt agree more" [− Tokens: 3 − Sentence-Labels: {'label': [NEGATIVE (0.8545)]}] --- Sentence: "couldnt agree more ." [− Tokens: 4 − Sentence-Labels: {'label': [NEGATIVE (0.6966)]}] --- Sentence: "could n't agree more" [− Tokens: 4 − Sentence-Labels: {'label': [NEGATIVE (0.8243)]}] --- Sentence: "could n't agree more ." [− Tokens: 5 − Sentence-Labels: {'label': [POSITIVE (0.5057)]}]

Here's also a viszualization from with the help from the solution of issue GH-1504, which shows that more contributes significantly to the negative result:
image

Regarding your question to produce a sentiment score in range -1 and 1, what about writing a wrapper function assigning a negative sign for negative predictions?

@cpuyyp
Copy link
Author

cpuyyp commented Dec 5, 2020

Thanks for the feedback!

It seems to me that the only positive sentence "could n't agree more ." have probability 0.5057, which is negative with probability 0.4943. So it doesn't work so well for all four cases (not giving positive with high confidence). My feeling on the reason is either

  • there is no such sentence in the training set since it's trained on product reviews
  • or this sentence is even hard for a non-native speaker. I am asking too much for the model.
    Which one do you prefer?

As to the score, I'd like to get a sentiment intensity score. For example, a sentence with score -0.9 is more negative than another one with score -0.3. However, the number in the outputting label [POSITIVE (0.5057)] is the confidence level.

@whoisjones
Copy link
Member

Agree to your first point, there might be not enough data for the standalone sentence "couldn't agree more" as a positive review. However I expect that the context of that phrase might be missing (or indeed negative) in the training data, for instance "couldn't agree more that this movie was bad" is obviously negative and the term "couldn't agree more" is only emphasizing the true opinion of the review.

Regarding your function - we're currently not supporting such a model, you might wanna try to train your own model with our dataset SENTEVAL_SST_GRANULAR which support at least 5 classes of sentiment. If you have a good dataset in mind which outputs the intensity of sentiments, feel free to open another issue for that.

@alanakbik
Copy link
Collaborator

Another option would be to train a TextRegressor model, which will output not a category but predict a continuous value. The 5-class sentiment datasets could be converted to a scale of -1 to 1 and the result would approximate what you are looking for. It's supported in Flair (you train it the same way as a TextClassifier).

@cpuyyp
Copy link
Author

cpuyyp commented Dec 8, 2020

Thanks for the suggestions! I'll try them out

@cpuyyp cpuyyp closed this as completed Dec 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants