✨ Added functionality for taking screenshot of original/raw page prior to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page. #95

seanmcguire12 · 2024-06-26T02:32:15Z

No description provided.

…r to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page.

…to be compatible with supertype ITarsier.

asim-shrestha · 2024-06-26T07:38:40Z

tarsier/core.py

@@ -77,3 +98,15 @@ async def _remove_tags(adapter: BrowserAdapter) -> None:
        script = "return window.removeTags();"

        await adapter.run_js(script)
+
+    @staticmethod
+    async def _hide_non_tag_elem(adapter: BrowserAdapter) -> None:


Should maybe just use the same name as TS (Should be done for all similar methods)

Suggested change

async def _hide_non_tag_elem(adapter: BrowserAdapter) -> None:

async def _hide_non_tag_elements(adapter: BrowserAdapter) -> None:

asim-shrestha · 2024-06-26T07:40:12Z

tarsier/core.py

+    ) -> Tuple[bytes, bytes, Dict[int, str]]:
        adapter = adapter_factory(driver)
+        initial_screenshot = await self._take_screenshot(adapter)
        tag_to_xpath = (
            await self._tag_page(adapter, tag_text_elements) if not tagless else {}
        )
-        screenshot = await self._take_screenshot(adapter)
+        await self._hide_non_tag_elem(adapter)
+        tagged_screenshot = await self._take_screenshot(adapter)
+        await self._revert_visibilities(adapter)
        if not tagless:
            await self._remove_tags(adapter)
-        return screenshot, tag_to_xpath if not tagless else {}
+        return (
+            initial_screenshot,
+            tagged_screenshot,
+            tag_to_xpath if not tagless else {},
+        )


Let's maybe keep this method the same and just move over logic to page_to_text.

This is because anyone using page_to_image currently expects a single image with all of the tagged elements (Alongside the page text). We also avoid changing the function signature this way

asim-shrestha · 2024-06-26T07:42:33Z

tarsier/core.py

+        combined_annotations: ImageAnnotatorResponse = {
+            "words": untagged_ocr_annotations["words"] + tagged_ocr_annotations["words"]
+        }
+        combined_annotations["words"] = list(
+            sorted(
+                combined_annotations["words"],
+                key=lambda x: (
+                    x["midpoint_normalized"][1],
+                    x["midpoint_normalized"][0],
+                ),
+            )
+        )


Should maybe make a combine_annotations() method that handles this resorting (To decouple this function having to care that its sorted)

…ase tagging from page_to_image() to page_to_text(), added method combine_annotations() to decouple the sorting logic from page_to_text().

seanmcguire12 added 3 commits June 25, 2024 17:24

✨ Added functionality for taking screenshot of original/raw page prio…

9ad4726

…r to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page.

✨ Reformatted core.py using ruff. Updated signature of page_to_image …

a9ee07f

…to be compatible with supertype ITarsier.

🔨 fix: specified correct type for the combined OCR annotations.

8ce819c

asim-shrestha changed the base branch from main to API-33 June 26, 2024 07:37

asim-shrestha reviewed Jun 26, 2024

View reviewed changes

♻️ Updated naming for _hide_non_tag_element(), moved logic for two ph…

81229ab

…ase tagging from page_to_image() to page_to_text(), added method combine_annotations() to decouple the sorting logic from page_to_text().

asim-shrestha approved these changes Jun 26, 2024

View reviewed changes

seanmcguire12 merged commit b1611b3 into API-33 Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Added functionality for taking screenshot of original/raw page prior to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page. #95

✨ Added functionality for taking screenshot of original/raw page prior to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page. #95

seanmcguire12 commented Jun 26, 2024

asim-shrestha Jun 26, 2024

asim-shrestha Jun 26, 2024 •

edited

Loading

asim-shrestha Jun 26, 2024

	async def _hide_non_tag_elem(adapter: BrowserAdapter) -> None:
	async def _hide_non_tag_elements(adapter: BrowserAdapter) -> None:

✨ Added functionality for taking screenshot of original/raw page prior to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page. #95

✨ Added functionality for taking screenshot of original/raw page prior to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page. #95

Conversation

seanmcguire12 commented Jun 26, 2024

asim-shrestha Jun 26, 2024

Choose a reason for hiding this comment

asim-shrestha Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

asim-shrestha Jun 26, 2024

Choose a reason for hiding this comment

asim-shrestha Jun 26, 2024 •

edited

Loading