-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
msgfmt segfaults building git 2.19.1 #39
Comments
@rofl0r I can take a try, but i will let you know if i can not complete it under this broken system. I just escape from the busy and hard time as a freshman. And i'm currently busy on fixing my broken toolchain and broken distribution. |
@rofl0r here's why:
Here, a sysdep msgid, but an empty msgstr, resulting in a mismatched num of msgid/msgstr( We really need a good logic in This actually reminds me that the
We should just parse the new lines, check if the block is terminated, and leave the all work to the callback function once a new block is parsed. |
what does gnu gettext do with this one ? |
Drop it. Empty msgstr is ignored by default. -C will trigger an error, since X/Open format strictly not allow the existence of emty msgstr. But to be honest, empty msgstr makes translation more convenient for translators. |
ok. do you have time to work on your suggested poparser modifications ? i suspect it should not be too much work... |
I have. No more than 1k loc, i guess. One week should be OK. Since i have other things in the meantime. |
1000 lines ? i hope not :) |
as stated in #39, the old parser is not good enough to handle all the po files. Similiar issues occurred over and over again because our dirty hacks in the project. so, i propose and implement this new parser, which: 1. needs to parse a po file two times. the first time will acquire the maximum width of every entry. the second time will copy the well-prepared contents into struct po_message, and pass it to the callback function. 2. every struct po_message contains all the information of one translation: msgid, msgctxt, msgid_plural, and msgstrs. comments may be added later. the logic of code is quite simple, nothing special need to explain. the special points are: 1. the first time, new parser gives no infomation about what the string is like. neither will the new parser give the exact size(sysdeped), nor you can calculate the exact size on your own. only xxx_len, strlen, sy- sdep in po_message_t is available. xxx_len is the length of the corressponding entry, strlen is almost the same. 2. sysdep present how many cases the string could be expanded to. since you know the length of the original string and the original string is always longer than the converted one, you can get a safe buffer size to work at the second stage. 3. poparser_sysdep(), a function like unescape(), with a bit flag as the third argument. that is, three bits correspond to st_priu32, st_priu64, st_priumax. since there're only up to two cases for every kind of sysdep, you could count from 0 to msg->sysdep-1, and poparser_sysdep will iterate every possible case eventually.
@rofl0r i've opened a new branch. actually the newpoparser managed to reduce some codes... msgfmt greatly reduced by 243 while poparser.c increased by 152. let's go back to the main topic. i only completed the work of poparser and msgfmt this weekend. msgmerge, though, is quite simple to rewrite, i have to leave it for the next weekend(or earlier). you could also try to rewrite, since msgmerge is just doing copy. you could test the new msgfmt freely. bug is welcome. besides, the new parser is completely capable of parsing po header and flags in comments. i only parse |
great job. i left a number of comments on your commits. i think you oversimplified the sysdep replacement code, and the version you provide is unfit for anything but the most simple cases. imo it should be possible to fit in the previous code, which i tested rigorously back when i wrote it. anyway, i'm waiting for your comments... |
for the record, this was the commit adding the sysdep expansion: 438a47c |
first, our problem: replace let but attention, we dont need to do mulplication for duplicated second, my solution: i want my according to the above analysis, i dont need to replace identical sysdep strings into different specifiers. so for every case, and the trick is that i use the num of case to pick up the right specifier. there're only up to four cases: third. how to fix bug: we could replace
|
detail has been descriped at #39 (comment). in short, we can not tell whether a `msg->sysdep = 2` stands for `case(%<PRIu32>)` or `case(%<PRIu64>)`. we need to record the appearances of every sysdep kind separately.
our produced .mo files need to be portable across systems. |
@rofl0r the new branch, whatever, passed my test. i'd like to hear a reply from you. |
great, thank you! i've asked @selkfoster to test your branch, maybe @awilfox could have a look too ? as you know sabotage itself does not use translations, so it's not really suited for testing (almost all packages are using --disable-nls). |
I will test the new parser branch, this take too much time, because it implies to rebuild all the packages. I will be able to test probably this weekend, I will let you know guys ... |
i started testing the sysdep expansion with the following input:
which results in buggy output: the first |
follow #39 (comment). it's obvious that, strstr will search for `%<PRIu32>` first, if there's one, then we get there and skip all other sysdep strings before the first `%<PRIu32>`. but what we want is, to search the first sysdep string. so, our new stragegy is to search for `%` instead. such that, we will always match the first sysdep string.
True. This build (git) now works fine against the newpoparser. :-) |
as stated in #39, the old parser is not good enough to handle all the po files. Similiar issues occurred over and over again because our dirty hacks in the project. so, i propose and implement this new parser, which: 1. needs to parse a po file two times. the first time will acquire the maximum width of every entry. the second time will copy the well-prepared contents into struct po_message, and pass it to the callback function. 2. every struct po_message contains all the information of one translation: msgid, msgctxt, msgid_plural, and msgstrs. comments may be added later. the logic of code is quite simple, nothing special need to explain. the special points are: 1. the first time, new parser gives no infomation about what the string is like. neither will the new parser give the exact size(sysdeped), nor you can calculate the exact size on your own. only xxx_len, strlen, sy- sdep in po_message_t is available. xxx_len is the length of the corressponding entry, strlen is almost the same. 2. sysdep present how many cases the string could be expanded to. since you know the length of the original string and the original string is always longer than the converted one, you can get a safe buffer size to work at the second stage. 3. poparser_sysdep(), a function like unescape(), with a bit flag as the third argument. that is, three bits correspond to st_priu32, st_priu64, st_priumax. since there're only up to two cases for every kind of sysdep, you could count from 0 to msg->sysdep-1, and poparser_sysdep will iterate every possible case eventually.
follow #39 (comment). it's obvious that, strstr will search for `%<PRIu32>` first, if there's one, then we get there and skip all other sysdep strings before the first `%<PRIu32>`. but what we want is, to search the first sysdep string. so, our new stragegy is to search for `%` instead. such that, we will always match the first sysdep string.
as stated in #39, the old parser is not good enough to handle all the po files. Similiar issues occurred over and over again because our dirty hacks in the project. so, i propose and implement this new parser, which: 1. needs to parse a po file two times. the first time will acquire the maximum width of every entry. the second time will copy the well-prepared contents into struct po_message, and pass it to the callback function. 2. every struct po_message contains all the information of one translation: msgid, msgctxt, msgid_plural, and msgstrs. comments may be added later. the logic of code is quite simple, nothing special need to explain. the special points are: 1. the first time, new parser gives no infomation about what the string is like. neither will the new parser give the exact size(sysdeped), nor you can calculate the exact size on your own. only xxx_len, strlen, sy- sdep in po_message_t is available. xxx_len is the length of the corressponding entry, strlen is almost the same. 2. sysdep present how many cases the string could be expanded to. since you know the length of the original string and the original string is always longer than the converted one, you can get a safe buffer size to work at the second stage. 3. poparser_sysdep(), a function like unescape(), with a bit flag as the third argument. that is, three bits correspond to st_priu32, st_priu64, st_priumax. since there're only up to two cases for every kind of sysdep, you could count from 0 to msg->sysdep-1, and poparser_sysdep will iterate every possible case eventually.
follow #39 (comment). it's obvious that, strstr will search for `%<PRIu32>` first, if there's one, then we get there and skip all other sysdep strings before the first `%<PRIu32>`. but what we want is, to search the first sysdep string. so, our new stragegy is to search for `%` instead. such that, we will always match the first sysdep string.
#40 (comment), closed. |
to reproduce:
resulting in:
the trans pointer of the second item is NULL. interestingly the 2nd item seems identical to the first apart from the trans pointer. i guess it's from the empty
msgid ""
string right at the start of the file.@xhebox do you have time to look into this ?
The text was updated successfully, but these errors were encountered: