Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Added functionality for taking screenshot of original/raw page prior to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page. #95

Merged
merged 4 commits into from
Jun 26, 2024

Conversation

seanmcguire12
Copy link
Contributor

No description provided.

…r to tagging. Added functionality for combining the OCR annotations of the original/raw page and the tagged page.
@asim-shrestha asim-shrestha changed the base branch from main to API-33 June 26, 2024 07:37
tarsier/core.py Outdated
@@ -77,3 +98,15 @@ async def _remove_tags(adapter: BrowserAdapter) -> None:
script = "return window.removeTags();"

await adapter.run_js(script)

@staticmethod
async def _hide_non_tag_elem(adapter: BrowserAdapter) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should maybe just use the same name as TS (Should be done for all similar methods)

Suggested change
async def _hide_non_tag_elem(adapter: BrowserAdapter) -> None:
async def _hide_non_tag_elements(adapter: BrowserAdapter) -> None:

tarsier/core.py Outdated
Comment on lines 32 to 47
) -> Tuple[bytes, bytes, Dict[int, str]]:
adapter = adapter_factory(driver)
initial_screenshot = await self._take_screenshot(adapter)
tag_to_xpath = (
await self._tag_page(adapter, tag_text_elements) if not tagless else {}
)
screenshot = await self._take_screenshot(adapter)
await self._hide_non_tag_elem(adapter)
tagged_screenshot = await self._take_screenshot(adapter)
await self._revert_visibilities(adapter)
if not tagless:
await self._remove_tags(adapter)
return screenshot, tag_to_xpath if not tagless else {}
return (
initial_screenshot,
tagged_screenshot,
tag_to_xpath if not tagless else {},
)
Copy link
Contributor

@asim-shrestha asim-shrestha Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe keep this method the same and just move over logic to page_to_text.

This is because anyone using page_to_image currently expects a single image with all of the tagged elements (Alongside the page text). We also avoid changing the function signature this way

tarsier/core.py Outdated
Comment on lines 58 to 69
combined_annotations: ImageAnnotatorResponse = {
"words": untagged_ocr_annotations["words"] + tagged_ocr_annotations["words"]
}
combined_annotations["words"] = list(
sorted(
combined_annotations["words"],
key=lambda x: (
x["midpoint_normalized"][1],
x["midpoint_normalized"][0],
),
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should maybe make a combine_annotations() method that handles this resorting (To decouple this function having to care that its sorted)

…ase tagging from page_to_image() to page_to_text(), added method combine_annotations() to decouple the sorting logic from page_to_text().
@seanmcguire12 seanmcguire12 merged commit b1611b3 into API-33 Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants