Add Paragraph and Section Contexts #76

lukehsiao · 2018-07-18T10:20:50Z

No description provided.

lukehsiao · 2018-07-18T10:23:01Z

fonduer/parser/parser.py

@@ -269,13 +283,117 @@ def _parse_figure_node(self, node, state):

        return state

-    def _parse_sentence(self, node, state):
+    def _parse_sentence(self, paragraph, node, text, field, state):


I'm not a big fan of how many arguments this function has. Any suggestions on simplifying this would be welcome. I don't want to have to alter the state for the parent so I pass the Paragraph directly, and unfortunately parsing sentences uses the node, and field.

Can we put text and field into state? Then when we done with _parse_sentence we can remove those two.

Yes. Good idea.

lukehsiao · 2018-07-18T10:23:29Z

tests/data/html_simple/md_para.html

+
+<body>
+    <h1 id="sample-markdown">Sample Markdown</h1>
+    <p>This is some basic, sample markdown. Unlike the other markdown document, however, this document actually contains paragraphs of text. That is, larger amounts of text that are all present in a single HTML node like the one you are currently reading.</p>


Just added this document so that we have a real paragraph to test.

lukehsiao · 2018-07-18T10:23:54Z

fonduer/parser/models/table.py

-        return self.__repr__() > other.__repr__()
-
-
-class Row(Context):


Dropping Row/Col

lukehsiao · 2018-07-18T10:24:15Z

Makefile

@@ -10,13 +10,15 @@ test: dev check

 check:
 	isort -rc -c fonduer/


Even if we don't enforce flake8, add style checks for the tests.

senwu · 2018-07-18T16:32:47Z

Makefile

@@ -10,13 +10,15 @@ test: dev check

 check:
 	isort -rc -c fonduer/


senwu · 2018-07-18T16:35:51Z

fonduer/parser/models/sentence.py

+                self.document.name,
+                self.section.position,
+                self.paragraph.position,
+                self.sentence_num,


Can you all use position instead of some position and some num?

senwu · 2018-07-18T16:38:47Z

fonduer/parser/parser.py

+                document=state["document"],
+                # TODO: This just takes the one and only Section in a document
+                # and assigns it as the Table's parent.
+                section=state["document"].sections[0],


why sections[0]?

Right now there is only one section in a document. This just grabs that one Section directly, rather than dealing with the logic of traversing up a node's parents.

senwu · 2018-07-18T16:44:36Z

fonduer/parser/parser.py

+            parts["stable_id"] = stable_id
+            parts["document"] = state["document"]
+            parts["position"] = state["paragraph"]["idx"]
+            parts["section"] = state["document"].sections[0]


same here. why sections[0] here instead of the current section from parents?

So to expand on this a little. Paragraph can be in Section, Table, or Cell. Would need to add several lines of logic to navigate up to the Section.

fonduer/parser/parser.py

-                    state["sentence"]["abs_offset"],
-                    abs_sentence_offset_end,
+            # Process the Paragraph
+            stable_id = "{}::{}:{}".format(


senwu · 2018-07-18T16:46:47Z

fonduer/parser/parser.py

+            )
+            state["sentence"]["abs_offset"] = abs_sentence_offset_end
+            if self.structural:
+                context_node = node.getparent() if field == "tail" else node


I know this is old code. Just make sure the logic here is correct. Will the the tail node be the sibling of the text node or now?

senwu · 2018-07-18T16:50:04Z

fonduer/parser/parser.py

@@ -269,13 +283,117 @@ def _parse_figure_node(self, node, state):

        return state

-    def _parse_sentence(self, node, state):
+    def _parse_sentence(self, paragraph, node, text, field, state):


Can we put text and field into state? Then when we done with _parse_sentence we can remove those two.

senwu · 2018-07-18T16:50:19Z

fonduer/parser/models/table.py

-        return self.__repr__() > other.__repr__()
-
-
-class Row(Context):


senwu · 2018-07-18T16:51:45Z

tests/parser/test_parser.py

@@ -103,6 +105,74 @@ def test_parse_md_details(caplog):
    assert header.dep_labels == ["compound", "ROOT"]


+def test_parse_md_paragraphs(caplog):


Add some checks for logic like some sentence's parent is some paragraph, section etc..

Sentence can only have a Paragraph parent, which is tested.

Yes, just want to check the parser results the correct answer (paragraph, section index).

senwu · 2018-07-18T20:36:57Z

LGTM.

lukehsiao added 2 commits July 18, 2018 01:47

Add Section context

c68e5fc

Add Paragraph Context

86958fc

lukehsiao added the enhancement New feature or request label Jul 18, 2018

lukehsiao added this to the v0.2.0 milestone Jul 18, 2018

lukehsiao self-assigned this Jul 18, 2018

lukehsiao requested a review from senwu July 18, 2018 10:20

lukehsiao commented Jul 18, 2018

View reviewed changes

senwu approved these changes Jul 18, 2018

View reviewed changes

lukehsiao added 4 commits July 18, 2018 10:46

Reduce arguments to parse_sentence

ea8f239

Change sentence_num to position for consistency

4b6118b

Navigate to parent section

4f75a1f

Add section test on sentence

428fae4

senwu closed this Jul 18, 2018

senwu reopened this Jul 18, 2018

senwu merged commit 2b51778 into master Jul 18, 2018

senwu deleted the data-model branch July 18, 2018 20:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Paragraph and Section Contexts #76

Add Paragraph and Section Contexts #76

lukehsiao commented Jul 18, 2018

lukehsiao Jul 18, 2018 •

edited

Loading

senwu Jul 18, 2018

lukehsiao Jul 18, 2018

lukehsiao Jul 18, 2018

lukehsiao Jul 18, 2018

senwu Jul 18, 2018

lukehsiao Jul 18, 2018

senwu Jul 18, 2018

senwu Jul 18, 2018

senwu Jul 18, 2018

lukehsiao Jul 18, 2018

senwu Jul 18, 2018

lukehsiao Jul 18, 2018

senwu Jul 18, 2018

lukehsiao Jul 18, 2018

This comment was marked as resolved.

This comment was marked as resolved.

senwu Jul 18, 2018

senwu Jul 18, 2018

senwu Jul 18, 2018

senwu Jul 18, 2018

lukehsiao Jul 18, 2018 •

edited

Loading

senwu Jul 18, 2018

senwu commented Jul 18, 2018

		return self.__repr__() > other.__repr__()


		class Row(Context):

		@@ -10,13 +10,15 @@ test: dev check

		check:
		isort -rc -c fonduer/

		@@ -103,6 +105,74 @@ def test_parse_md_details(caplog):
		assert header.dep_labels == ["compound", "ROOT"]


		def test_parse_md_paragraphs(caplog):

Add Paragraph and Section Contexts #76

Add Paragraph and Section Contexts #76

Conversation

lukehsiao commented Jul 18, 2018

lukehsiao Jul 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukehsiao Jul 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

senwu commented Jul 18, 2018

lukehsiao Jul 18, 2018 •

edited

Loading

lukehsiao Jul 18, 2018 •

edited

Loading