Extracting entities inside an entity #32

gihanpanapitiya · 2020-08-29T01:18:13Z

Does anyone knows how to write a custom parser to extract a named entity inside an entity.

For example from the following sentence I want to extract 'boiling' which will be inside the prefix entity.

d = Sentence('Synthesis of 2,4,6-trinitrotoluene (3a).The procedure was followed to yield a pale yellow solid (boiling point 240 °C)')

This is my attempt to write the parser:

class BoilingPoint(BaseModel):
    value = StringType()
    units = StringType()
    prefix = StringType()
    name = StringType()
    
Compound.boiling_points = ListType(ModelType(BoilingPoint))`


prefix = (R(u'^b\.?p\.?$', re.I) | I(u'boiling')(u'name') + I(u'point')).add_action(join)(u'prefix')
units = (W(u'°') + Optional(R(u'^[CFK]\.?$')))(u'units').add_action(merge)
value = R(u'^\d+(\.\d+)?$')(u'value')
bp = (prefix + value + units)(u'bp')


class BpParser(BaseParser):
    root = bp

    def interpret(self, result, start, end):
        compound = Compound(
            boiling_points=[
                BoilingPoint(
                    value=first(result.xpath('./value/text()')),
                    units=first(result.xpath('./units/text()')),
                    prefix = first(result.xpath('./prefix/text()')),
                    name = first(result.xpath('./name/text()')),
                    
                )
            ]
        )
        yield compound

Sentence.parsers = [BpParser()]

However what d.records.serialize() produces is,

[{'boiling_points': [{'value': '240',
'units': '°C',
'prefix': 'boiling point'}]}]

The text was updated successfully, but these errors were encountered:

maddenfederico · 2020-09-02T20:12:40Z

All you have to do is tweak the xpath you use to access the result from the name element. Element results are returned as a tree with whatever you assign to root as the root and all the elements that form a part of root as child nodes, and so on.

So you would write name = first(result.xpath('./prefix/name/text()')), since name is a child of prefix

gihanpanapitiya · 2020-09-02T23:18:46Z

All you have to do is tweak the xpath you use to access the result from the name element. Element results are returned as a tree with whatever you assign to root as the root and all the elements that form a part of root as child nodes, and so on.

So you would write name = first(result.xpath('./prefix/name/text()')), since name is a child of prefix

I tried that, but I am still getting the same output as before.

maddenfederico · 2020-09-02T23:57:30Z

might be the .add_action(join) then. Seems like that merges all of the tokens and puts them in the same node. It may not be the best solution, but the first thing that comes to my mind is to capture boiling and point as separate elements and then join them within interpret(). I'm actually curious so I'm about to do my own tests

gihanpanapitiya · 2020-09-03T00:12:09Z

Thanks for the suggestion! I haven't worked with interpret(). I am going to start experimenting with it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting entities inside an entity #32

Extracting entities inside an entity #32

gihanpanapitiya commented Aug 29, 2020 •

edited

Loading

maddenfederico commented Sep 2, 2020

gihanpanapitiya commented Sep 2, 2020

maddenfederico commented Sep 2, 2020

gihanpanapitiya commented Sep 3, 2020

Extracting entities inside an entity #32

Extracting entities inside an entity #32

Comments

gihanpanapitiya commented Aug 29, 2020 • edited Loading

maddenfederico commented Sep 2, 2020

gihanpanapitiya commented Sep 2, 2020

maddenfederico commented Sep 2, 2020

gihanpanapitiya commented Sep 3, 2020

gihanpanapitiya commented Aug 29, 2020 •

edited

Loading