Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练集标注的问题 #100

Closed
brealisty opened this issue May 27, 2020 · 1 comment
Closed

训练集标注的问题 #100

brealisty opened this issue May 27, 2020 · 1 comment

Comments

@brealisty
Copy link

请教一个问题,对于文字检测阶段,在对图片进行标注的时候,有的文字区域离得比较远,语义上也属于两个部分;但是也有排版离得远但是语义上属于同一部分的,在标注的时候应该怎么标注呢?是不管离得远近语义一直就框在一起还是,只以距离衡量?
如果用人的思维考虑语义的话,可能会引入噪声,模型难收敛、或者预测结果不稳定;
如果只是以距离作为指标的话,在下游任务对相同语义的两个或者几个部分,合并起来比较困难。如果只是用规则来合并的话,那就对检测和识别(也可能加入纠错)阶段的准确率要求很高。
请教一下,有什么好的思路呢?

@LDOUBLEV
Copy link
Collaborator

距离较远建议分开标注,检测时不存在语义的问题,距离较远有分开标注也有一起标注的不太合理,识别时已经有了文本的位置和文本,可以这时候再做语义分析

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants