Handle multiple K codes in single data row #44

josteinl · 2024-03-01T08:41:17Z

I have run into a problem parsing SGF files, where a single data row contains several code K.

Are you open to a PR that will change the result type from int to list[int] for code K?

In addition, the same applies to T codes (may get multiple on the same line).

The text was updated successfully, but these errors were encountered:

redhog · 2024-03-01T16:15:48Z

Hm. It would probably break a bunch of code, including the stop-code-inferring code to just insert a list in that column. But it is a good question what should be done when a row contains multiple entries with the same code...

In the examples you've found in the wild, how is this used / why isn't it just multiple rows?

josteinl · 2024-03-04T07:42:41Z

The file format limits a row to a specific depth (one row per depth). So you may have several comment codes in a specific depth. In this example both flushing and hammering start:

D=1.31,C=4,B=50.000,A=0.353,P=8.100,AZ=18.600,R=57.000,I=0.000,J=0.000,AP=1,V=0.000
D=1.32,C=0,B=50.000,A=0.353,P=8.100,AZ=18.600,R=57.000,I=0.000,J=0.000,AP=1,V=0.000
D=1.33,C=4,B=2.664,A=0.008,P=3.898,AZ=1.507,R=29.150,I=0.278,J=0.000,AP=1,V=0.000,K=72,T=Spyling begynner,K=74,T=Slag starter
D=1.34,C=493,B=2.000,A=0.000,P=5.800,AZ=22.000,R=55.000,I=0.700,J=0.000,AP=1,V=0.000
D=1.35,C=5,B=64.000,A=0.000,P=5.800,AZ=21.600,R=58.000,I=0.700,J=0.000,AP=1,V=0.000

redhog · 2024-03-04T17:40:26Z

Ah, you are indeed right " information collected at predetermined intervals, indicated by a sync-parameter". But is this evenly sampled-requirement actually fulfilled in practice in the wild?

josteinl · 2024-03-05T08:03:12Z

Yes, the readings are at set depth intervals, and the data files adhere to this. The example above is from a customer complaining that we do not register both flushing and hammering starting at depth 1.33.

redhog · 2024-03-18T15:39:49Z

Conceptually I still think multiple rows with the same depth would be better/easier. But maybe we should have a call about this and brainstorm different solutions (I'm back from full time parental leave from this week, working part time from now on)?

josteinl · 2024-03-19T10:44:28Z

I have started building a new SGF parser to handle this problem with repeated comment codes and texts. The new parser will be made to handle our specific needs. Today we use the libsgfparser for a first pass, and then continue with another pass with our internal parser. The new package I hope will do most of our parsing in one go (but probably needs to make two passes).

It will be open-sourced, but more to our needs. I'm also afraid to do changes in the libsgfparser that affect your usage.

My plan is:

If there is more than one comment code (K=), I only keep one, and move the rest to the text field
If there is more than one text field (T=), I merge them

So

...,K=72,T=Spyling begynner,K=74,T=Slag starter

becomes

...,K=72,T=74, Spyling begynner, Slag starter

so we don't miss any codes.

Which comment code to keep is done by a prioritization; codes in the 90s range must be kept (stop codes), then codes in the 40s range (for detecting bedrock), then other codes.

If you want the same behaviour in libsgfparser, I will be happy to submit a PR for this.

redhog · 2024-03-26T06:50:31Z

Moving to another field isn't a bad idea.
What if we invent a new field, whose value is a list of comment codes, and copy all the comment codes there, making the K field be a single value (as per your prioritization above)? That field could have a list type (rather than text)...

josteinl · 2024-04-03T13:08:43Z

Just published https://pypi.org/project/sgf-parser version 0.0.1a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle multiple K codes in single data row #44

Handle multiple K codes in single data row #44

josteinl commented Mar 1, 2024

redhog commented Mar 1, 2024

josteinl commented Mar 4, 2024

redhog commented Mar 4, 2024

josteinl commented Mar 5, 2024

redhog commented Mar 18, 2024

josteinl commented Mar 19, 2024

redhog commented Mar 26, 2024

josteinl commented Apr 3, 2024

Handle multiple K codes in single data row #44

Handle multiple K codes in single data row #44

Comments

josteinl commented Mar 1, 2024

redhog commented Mar 1, 2024

josteinl commented Mar 4, 2024

redhog commented Mar 4, 2024

josteinl commented Mar 5, 2024

redhog commented Mar 18, 2024

josteinl commented Mar 19, 2024

redhog commented Mar 26, 2024

josteinl commented Apr 3, 2024