You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I am trying to perform an NER experiment on a custom dataset containing a lot of food items.
I have labels for certain unigrams and bigrams for my training data.
My label corpus contains "green chilli" = "vegetable". I don't have "chilli" as a label
I am using this label list in order to annotate sentences for NER.
For example:
A sentence might contain a bigram such as "green chilli" with it's associated label = "vegetable"
Currently while generating the features, I am marking both "green" and "chilli" as "vegetable".
My annotation pipeline is as follows:
Split sentence into unigrams
Check if unigram exists in label list -> If label exists mark unigram with label
Get bigram by considering token + sentence[idx+1] or token + sentence[idx-1]
Check if bigram exists in label corpus -->> mark both token and sentence[idx+1] or sentence[idx-1] with that label
As a result of point number 4, both green and chilli get marked as vegetable
So when I train my model and run inference on a test sentence containing "green chilli", I would get "vegetable", "vegetable" twice.
What would be the best way to annotate this using word2features?
The text was updated successfully, but these errors were encountered:
Hello,
I am trying to perform an NER experiment on a custom dataset containing a lot of food items.
I have labels for certain unigrams and bigrams for my training data.
My label corpus contains "green chilli" = "vegetable". I don't have "chilli" as a label
I am using this label list in order to annotate sentences for NER.
For example:
A sentence might contain a bigram such as "green chilli" with it's associated label = "vegetable"
Currently while generating the features, I am marking both "green" and "chilli" as "vegetable".
My annotation pipeline is as follows:
As a result of point number 4, both green and chilli get marked as vegetable
So when I train my model and run inference on a test sentence containing "green chilli", I would get "vegetable", "vegetable" twice.
What would be the best way to annotate this using word2features?
The text was updated successfully, but these errors were encountered: