Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Types, update README/documentation #143

Open
ghost opened this issue Oct 24, 2021 · 6 comments
Open

Types, update README/documentation #143

ghost opened this issue Oct 24, 2021 · 6 comments

Comments

@ghost
Copy link

ghost commented Oct 24, 2021

Could the following types be added to the documentation?

'u'/117/GPMF_TYPE_STRING_UTF8 and '#'/35/GPMF_TYPE_COMPRESSED (I've only found '#' in GoPro Max so far?)

For those of us trying to implement their own parser, should we assume "internal types" 0xfe and 0xff can be ignored or will that break parsing?

@dnewman-gpsw
Copy link
Collaborator

The camera currently doesn't implement any UTF-8 datatypes. These variable length datatypes are potentially problematic in GPMF, as a type-size-repeat field of ''u',1,10' may not contain 10 characters, as the datasize for each entry is content dependent. As size x repeat must be the storage size, for UTF-8, repeat will not automatically be the number of entries. It works, just less pretty.
Another less pretty example is compression. Compression typical ly reduces the data size by using variable length encoding, this is what the '#' datatype signifies. The compression is delta-Huffman-run-length encoding. A little like JPEG entropy encoding. It is not documented. You can view the encoding side within https://github.com/gopro/gpmf-write/blob/master/GPMF_writer.c
You should not see either 0xfe or 0xff types in stored GPMF bitstream.

@ghost
Copy link
Author

ghost commented Oct 26, 2021

The camera currently doesn't implement any UTF-8 datatypes. These variable length datatypes are potentially problematic in GPMF, as a type-size-repeat field of ''u',1,10' may not contain 10 characters, as the datasize for each entry is content dependent. As size x repeat must be the storage size, for UTF-8, repeat will not automatically be the number of entries. It works, just less pretty.

Right, difficult with varying code point length in UTF-8 of course. Interpretation of UTF-8 data would have to be a special case, I guess, i.e. not as clean as for the rest of the types.

I did have odd encounters with ascii strings however, where some values have size 1, repeat X, others size X, repeat 1. I think this was for Hero7 Black. So for the header, I interpret both "size" and "repeat" as UInt16 and switch them around whenever I encounter type "c"/99. All c/ascii are then treated as size X, repeats 1 (works better in my case, but will probably mean future bugs...).

A follow up question for Ascii and encoding: these go outside of the standard range, all the way up to 255 if I read correctly. Is there a specific extended ascii code page involved, that I should go by?

Another less pretty example is compression. Compression typical ly reduces the data size by using variable length encoding, this is what the '#' datatype signifies. The compression is delta-Huffman-run-length encoding. A little like JPEG entropy encoding. It is not documented. You can view the encoding side within https://github.com/gopro/gpmf-write/blob/master/GPMF_writer.c

Thanks, I pondered whether I should try to decode or not, but I'll take a look regardless, thanks for the link.

You should not see either 0xfe or 0xff types in stored GPMF bitstream.

Noted, thanks!

@dnewman-gpsw
Copy link
Collaborator

All 'c' type character arrays should now be in the format size 'X' repeat 1, but you are correct this wasn't historically consistent with some developers thinking, repeat = strlen(input);. As repeat can have a temporal meaning in some contexts, we worked to have this cleared up. Fortunately there has yet to be any time varying stream that used type 'c', so all existing string array sizes can be 'size x repeat'. Only a single byte ASCII code is support within a type 'c' -- 0 to 255.

@ghost
Copy link
Author

ghost commented Oct 26, 2021

All 'c' type character arrays should now be in the format size 'X' repeat 1, but you are correct this wasn't historically consistent with some developers thinking, repeat = strlen(input);. As repeat can have a temporal meaning in some contexts, we worked to have this cleared up. Fortunately there has yet to be any time varying stream that used type 'c', so all existing string array sizes can be 'size x repeat'.

Aha, so there was a reason for this. Good to know what to expect from now on, thanks.

Only a single byte ASCII code is support within a type 'c' -- 0 to 255.

Right, but standard ascii is 7 bit so it only goes up to 127? I'm quite possibly out of the loop here, but 128-255 was historically used quite flexibly to extend the standard character set for various languages. So since that range of characters don't obey a single standard, it's often difficult to predict ascii above 127. Is GoPro perhaps using something like Windows-1252, ISO-8859-1? Or am I misunderstanding?

@dnewman-gpsw
Copy link
Collaborator

Yes, we haven't specified, although we are current using ISO 8859-1. Within ACCL we use "m/s²" we used superscript two for squared, code 0xB2.

@ghost
Copy link
Author

ghost commented Oct 26, 2021

Great, thanks! Yes, I noticed the superscript which was actually what had me wondering in the first place, since they were decoded correctly for my UTF-8 strings. This makes sense since I think UTF-8 is a superset of ISO8859-1, making it an excellent choice for single byte encoding. I'll just decode 'c' as UTF-8 and it should work fine. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant