Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multiple K codes in single data row #44

Open
josteinl opened this issue Mar 1, 2024 · 8 comments
Open

Handle multiple K codes in single data row #44

josteinl opened this issue Mar 1, 2024 · 8 comments

Comments

@josteinl
Copy link
Collaborator

josteinl commented Mar 1, 2024

I have run into a problem parsing SGF files, where a single data row contains several code K.

Are you open to a PR that will change the result type from int to list[int] for code K?

In addition, the same applies to T codes (may get multiple on the same line).

@redhog
Copy link
Member

redhog commented Mar 1, 2024

Hm. It would probably break a bunch of code, including the stop-code-inferring code to just insert a list in that column. But it is a good question what should be done when a row contains multiple entries with the same code...

In the examples you've found in the wild, how is this used / why isn't it just multiple rows?

@josteinl
Copy link
Collaborator Author

josteinl commented Mar 4, 2024

The file format limits a row to a specific depth (one row per depth). So you may have several comment codes in a specific depth. In this example both flushing and hammering start:

D=1.31,C=4,B=50.000,A=0.353,P=8.100,AZ=18.600,R=57.000,I=0.000,J=0.000,AP=1,V=0.000
D=1.32,C=0,B=50.000,A=0.353,P=8.100,AZ=18.600,R=57.000,I=0.000,J=0.000,AP=1,V=0.000
D=1.33,C=4,B=2.664,A=0.008,P=3.898,AZ=1.507,R=29.150,I=0.278,J=0.000,AP=1,V=0.000,K=72,T=Spyling begynner,K=74,T=Slag starter
D=1.34,C=493,B=2.000,A=0.000,P=5.800,AZ=22.000,R=55.000,I=0.700,J=0.000,AP=1,V=0.000
D=1.35,C=5,B=64.000,A=0.000,P=5.800,AZ=21.600,R=58.000,I=0.700,J=0.000,AP=1,V=0.000

@redhog
Copy link
Member

redhog commented Mar 4, 2024

Ah, you are indeed right " information collected at predetermined intervals, indicated by a sync-parameter". But is this evenly sampled-requirement actually fulfilled in practice in the wild?

@josteinl
Copy link
Collaborator Author

josteinl commented Mar 5, 2024

Yes, the readings are at set depth intervals, and the data files adhere to this. The example above is from a customer complaining that we do not register both flushing and hammering starting at depth 1.33.

@redhog
Copy link
Member

redhog commented Mar 18, 2024

Conceptually I still think multiple rows with the same depth would be better/easier. But maybe we should have a call about this and brainstorm different solutions (I'm back from full time parental leave from this week, working part time from now on)?

@josteinl
Copy link
Collaborator Author

I have started building a new SGF parser to handle this problem with repeated comment codes and texts. The new parser will be made to handle our specific needs. Today we use the libsgfparser for a first pass, and then continue with another pass with our internal parser. The new package I hope will do most of our parsing in one go (but probably needs to make two passes).

It will be open-sourced, but more to our needs. I'm also afraid to do changes in the libsgfparser that affect your usage.

My plan is:

  • If there is more than one comment code (K=), I only keep one, and move the rest to the text field
  • If there is more than one text field (T=), I merge them

So

...,K=72,T=Spyling begynner,K=74,T=Slag starter

becomes

...,K=72,T=74, Spyling begynner, Slag starter

so we don't miss any codes.

Which comment code to keep is done by a prioritization; codes in the 90s range must be kept (stop codes), then codes in the 40s range (for detecting bedrock), then other codes.

If you want the same behaviour in libsgfparser, I will be happy to submit a PR for this.

@redhog
Copy link
Member

redhog commented Mar 26, 2024

Moving to another field isn't a bad idea.
What if we invent a new field, whose value is a list of comment codes, and copy all the comment codes there, making the K field be a single value (as per your prioritization above)? That field could have a list type (rather than text)...

@josteinl
Copy link
Collaborator Author

josteinl commented Apr 3, 2024

Just published https://pypi.org/project/sgf-parser version 0.0.1a1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants