Handling unigram and bigram features at the same time in word2features #137

AbhishekBose · 2021-12-24T08:52:38Z

Hello,
I am trying to perform an NER experiment on a custom dataset containing a lot of food items.
I have labels for certain unigrams and bigrams for my training data.

My label corpus contains "green chilli" = "vegetable". I don't have "chilli" as a label
I am using this label list in order to annotate sentences for NER.

For example:

A sentence might contain a bigram such as "green chilli" with it's associated label = "vegetable"

Currently while generating the features, I am marking both "green" and "chilli" as "vegetable".
My annotation pipeline is as follows:

Split sentence into unigrams
Check if unigram exists in label list -> If label exists mark unigram with label
Get bigram by considering token + sentence[idx+1] or token + sentence[idx-1]
Check if bigram exists in label corpus -->> mark both token and sentence[idx+1] or sentence[idx-1] with that label

As a result of point number 4, both green and chilli get marked as vegetable

So when I train my model and run inference on a test sentence containing "green chilli", I would get "vegetable", "vegetable" twice.

What would be the best way to annotate this using word2features?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling unigram and bigram features at the same time in word2features #137

Handling unigram and bigram features at the same time in word2features #137

AbhishekBose commented Dec 24, 2021

Handling unigram and bigram features at the same time in word2features #137

Handling unigram and bigram features at the same time in word2features #137

Comments

AbhishekBose commented Dec 24, 2021