Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAGE API: add __hash__ to generated code #443

Merged
merged 4 commits into from
Apr 29, 2020
Merged

PAGE API: add __hash__ to generated code #443

merged 4 commits into from
Apr 29, 2020

Conversation

kba
Copy link
Member

@kba kba commented Feb 19, 2020

... so instances of generateDS-generated class can be used as keys in dict etc.

@kba kba requested a review from bertsky February 19, 2020 16:34
@codecov-io
Copy link

codecov-io commented Feb 19, 2020

Codecov Report

Merging #443 into master will decrease coverage by 1.78%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #443      +/-   ##
==========================================
- Coverage   81.91%   80.13%   -1.79%     
==========================================
  Files          39       40       +1     
  Lines        2328     2376      +48     
  Branches      429      433       +4     
==========================================
- Hits         1907     1904       -3     
- Misses        348      399      +51     
  Partials       73       73              
Impacted Files Coverage Δ
ocrd_models/ocrd_page_user_methods.py 0.00% <0.00%> (ø)
ocrd/ocrd/resolver.py 93.10% <0.00%> (-3.45%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 664df4b...b416b1f. Read the comment docs.

Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic! This works fine, and paves the way for the other additions as requested in #240...

@kba
Copy link
Member Author

kba commented Apr 21, 2020

Cannot recreate the PAGE XML with generateDS 2.30.11. With 2.35.18 it works but now the namespaces are missing again :/ @bertsky Can you please check whether this works as expected besides the missing PAGE nsprefix?

@bertsky
Copy link
Collaborator

bertsky commented Apr 23, 2020

Cannot recreate the PAGE XML with generateDS 2.30.11. With 2.35.18 it works but now the namespaces are missing again :/ @bertsky Can you please check whether this works as expected besides the missing PAGE nsprefix?

This immediately falls into the same trap as #451.

@kba
Copy link
Member Author

kba commented Apr 23, 2020

This immediately falls into the same trap as #451.

I think the issue is that generateDS tries to use enum.Enum for simpleType enumerations which is ia standard lib since 3.4. Without a type mixin, enum constants only inherit from Enum, hence the issues when serializing these enum constants.

IMHO this needs to be fixed up-stream. In the meantime, I found that a sed one-liner after schema generation can fix the enum types.

Test case to verify:

    def test_simple_types(self):
        regions = self.pcgts.get_Page().get_TextRegion()
        reg = regions[0]
        self.assertTrue(isinstance(reg.get_type(), str))
        self.assertEqual(reg.get_type(), TextTypeSimpleType.CREDIT)
        self.assertTrue(isinstance(TextTypeSimpleType.CREDIT, str))
        self.assertEqual(reg.get_type(), 'credit')
        self.assertTrue(isinstance(TextTypeSimpleType.CREDIT, str))
        reg.set_type(TextTypeSimpleType.PAGENUMBER)
        self.assertEqual(reg.get_type(), 'page-number')
        self.assertTrue(isinstance(reg.get_type(), str))

@bertsky
Copy link
Collaborator

bertsky commented Apr 23, 2020

I think the issue is that generateDS tries to use enum.Enum for simpleType enumerations which is ia standard lib since 3.4. Without a type mixin, enum constants only inherit from Enum, hence the issues when serializing these enum constants.

IMHO this needs to be fixed up-stream. In the meantime, I found that a sed one-liner after schema generation can fix the enum types.

Great! So the generateDS problem is fairly limited and we can hope to live with it.

Just of curiosity, because the diff of the generated source does not show it: What happens to non-string simpleType with your sed expression? (AFAIK we only have ConfSimpleType in PAGE as an example, but an important one no less.)

@kba
Copy link
Member Author

kba commented Apr 23, 2020

Just of curiosity, because the diff of the generated source does not show it: What happens to non-string simpleType with your sed expression? (AFAIK we only have ConfSimpleType in PAGE as an example, but an important one no less.)

ConfSimpleType is not implemented as an Enum. Here's all the classes affected by this change:

grep -r 'str, Enum' ocrd_models
ocrd_models/ocrd_models/ocrd_page_generateds.py:class AlignSimpleType(str, Enum):
ocrd_models/ocrd_models/ocrd_page_generateds.py:class ChartTypeSimpleType(str, Enum):
ocrd_models/ocrd_models/ocrd_page_generateds.py:class ColourDepthSimpleType(str, Enum):
ocrd_models/ocrd_models/ocrd_page_generateds.py:class ColourSimpleType(str, Enum):
ocrd_models/ocrd_models/ocrd_page_generateds.py:class GraphicsTypeSimpleType(str, Enum):
ocrd_models/ocrd_models/ocrd_page_generateds.py:class GroupTypeSimpleType(str, Enum):
ocrd_models/ocrd_models/ocrd_page_generateds.py:class PageTypeSimpleType(str, Enum):
ocrd_models/ocrd_models/ocrd_page_generateds.py:class ProductionSimpleType(str, Enum):
ocrd_models/ocrd_models/ocrd_page_generateds.py:class ReadingDirectionSimpleType(str, Enum):
ocrd_models/ocrd_models/ocrd_page_generateds.py:class TextDataTypeSimpleType(str, Enum):
ocrd_models/ocrd_models/ocrd_page_generateds.py:class TextLineOrderSimpleType(str, Enum):
ocrd_models/ocrd_models/ocrd_page_generateds.py:class TextTypeSimpleType(str, Enum):
ocrd_models/ocrd_models/ocrd_page_generateds.py:class underlineStyleType(str, Enum):

@kba kba merged commit b22b830 into OCR-D:master Apr 29, 2020
@kba kba deleted the gds-hash-id branch April 29, 2020 15:53
@kba
Copy link
Member Author

kba commented Apr 30, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants