-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing the target language for OCR'ing #12
Comments
Hi, @anhhaibkhn Hyper-Table-OCR/table/__init__.py Line 193 in 1432b1f
Feel free to contact me if the code troubles you. |
Hi @MrZilinXiao , Is this possible to reproduce your model to adapt to the Non-English languages (such as Japanese)? Thanks in advance. |
In fact, the structure reconstruction model (A UNet in this project) is irrelevant with the language in your input, since it only produces the structure of the given table. If you'd like to reproduce the demo in our GIF, all you need to do is to replace the OCR module, making it adapt to your target language. An OCRHandler class is here: https://github.com/MrZilinXiao/Hyper-Table-OCR/blob/main/ocr/__init__.py. You may just subclass it, naming it Talking back to the code, here's the meaning of the default parameter list:
The output is BTW, be aware of this issue: #2. (use google translate if Chinese troubles you) |
Hi @MrZilinXiao ! From what I understand (via Google translate your provided link) this project currently only supports cell merge detection in the row direction. I am sorry, but I haven't fully understood, how the reconstruction algorithm work, for example: I will also try to adapt it to the Japanese language in the OCR settings module and let you know how it goes. |
Yes, this project currently only supports cell merge detection in the row direction, so it has difficulty dealing with the example image you provide in the description of this issue since it has both cell merge in row and column direction. class OCRBlock(object):
def __init__(self, coord, content, conf=-1.0):
self.coord: np.ndarray = coord # xyxyxyxy
self.conf = conf
assert len(coord) == 8, "xyxyxyxy not fit for OCRBlock!"
self.shape = Polygon([coord[0:2], coord[2:4], coord[4:6], coord[6:]])
self.ocr_content: Union[List[str], str] = content
class TableCell(OCRBlock):
def __init__(self, coord):
super(TableCell, self).__init__(coord, [])
self.matched = False
self.row_range = [-1, -1]
self.col_range = [-1, -1]
@property
def upper_y(self):
return self.coord[[1, 3]].mean()
@property
def left_x(self):
return self.coord[[0, 6]].mean()
@property
def right_x(self):
return self.coord[[2, 4]].mean()
row_range and col_range might be difficult to understand, so here I make a picture by hand (all indexes follow 0-index tradition): Considering a table containing only row-merged cells, the row_range of cell A, B, C is [0, 0], [0,0], [2,2] respectively, and the col_range of cell A, B, C is [0, 0], [1,2] and [1,2]. The dotted line is only for clear depiction and it does not exists in the table. Hope this solves your problem. Sorry that I don't have a running example with debug info since I already shifted my research interest from Table OCR. |
Hi @MrZilinXiao MrZilinXiao
Thank you for sharing the project. Is this possible to reproduce your model to adapt to other languages?
I could extract the cell coordinates from the table, but I am facing difficulty reconstructing the table, especially for the tables having merged cells. For example:
data:image/s3,"s3://crabby-images/7da6b/7da6be56f3200ca90cc1acd5f443482cac0b1da7" alt="numerous_merged_cells_table"
Could you explain further the idea of how to reconstruct the table?
Thanks for your time.
The text was updated successfully, but these errors were encountered: