Skip to content

Memory initializer in string literal #3326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 16, 2015

Conversation

gagern
Copy link
Contributor

@gagern gagern commented Apr 9, 2015

Having the memory initializer as a separate file can become a huge problem in some environments. It would be nice if we could offer more alternatives. I see how big array literals would be kind of bulky. But how about encoding the memory initializer into one or more strings? Here is some code to do that, based on a base 88 encoding scheme. If you like the idea, it would be nice if you could introduce some setting to switch between file and string.

@gagern
Copy link
Contributor Author

gagern commented Apr 9, 2015

I just found out about #2188 and the comparisons done there, and I think there are better approaches than the one I used here.

@gagern
Copy link
Contributor Author

gagern commented Apr 9, 2015

Since base 88 is not as memory-efficient as hoped, and in particular compresses really poorly, here is a forced push using a lookup table instead. This follows findings reported on #2188. I've by now included a setting as well, and only emit mem file or initializer string, not both. So in my opinion this is now ready for review and merge.

@kripken
Copy link
Member

kripken commented Apr 9, 2015

The first issues here are as discussed in that other pull - we need to make sure this works with all other features. In particular we had problems there with minification breaking the string, I don't remember the details though. If the full test suite passes, then we are probably ok.

Aside from that, I saw some numbers in the other pull, but can you please summarize how this works, the measured effect on code size, and the measured effect on startup time?

@gagern
Copy link
Contributor Author

gagern commented Apr 9, 2015

Do you mean the string minification in this code, or the one applied by e.g. closure afterwards? The former was discussed and addressed in #2188 (comment) if that is what you have in mind.

Using the tool from https://github.com/gagern/Web-Benchmarks/tree/tinylut/meminit I get the following figures for memory efficiency, sorted by gzipped size, and augmented by some load time experiments:

Representation uncompressed size gzipped size Firefox Chrome Safari
binary 167.2 KB 42.2 KB
tinylut <== 187.4 KB 46.2 KB 5.4 ms 7.7 ms 73.1 ms
minstr ^ 32 217.9 KB 46.9 KB
minstr 230.8 KB 47.1 KB 4.6 ms 8.1 ms 89.6 ms
str 312.1 KB 48.6 KB
int8 536.5 KB 55.7 KB 30.4 ms 64.7 ms 54.4 ms
base64 223.0 KB 59.6 KB
base88 186.7 KB 61.3 KB
int32 364.3 KB 79.7 KB

The current mem init file based approach depends not only on parse time but on network latency, so it's hard to include that in the table. The load times above were the median of 16 runs each.

I'm currently running the test suite on the plain incoming branch, starting not with the sanity component but the default component. Once that succeeds I'll run tests for my pull requests.

@gagern
Copy link
Contributor Author

gagern commented Apr 9, 2015

As for how this works: I compute how often each byte occurs, and then assign printable values to the common bytes, leaving the values which require escape sequences for the less common values. Small characters are encoded in octal notation if possible, since that's shorter. To reconstruct, I provide a translation table with 256 elements. Each char code from the string is looked up in that table to translate it into a byte which is then stored in some typed array.

@kripken
Copy link
Member

kripken commented Apr 10, 2015

What does "lut" mean in "tinylut"?

@@ -708,7 +708,7 @@ try:
if js_opts is None: js_opts = opt_level >= 2
if llvm_opts is None: llvm_opts = LLVM_OPT_LEVEL[opt_level]
if opt_level == 0: debug_level = max(3, debug_level)
if memory_init_file is None: memory_init_file = opt_level >= 2
if memory_init_file is None: memory_init_file = 2 if opt_level >= 2 else 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this change make this new mode the default in -O2+? But it's less optimized than a binary mem init?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would make this the default. I would have assumed that for most applications, the overhead for a second request would be bigger than the performance cost of parsing that initializer. But I guess you know your customers better than I do, so if you say that for most their setups mem files are preferable, then I won't argue with that. I know that I personally will try to use the string initializer simply because I prefer to have a single artifact containing the whole program, but that's just a matter of taste to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to do the requests in parallel. But I didn't really consider until now that if this is efficient enough, it could be better than even that. I wonder how to measure it though.

Regardless, until we have such measurements, I don't think we should change the default - we shouldn't confuse users with changes unless they have proven benefits.

@gagern
Copy link
Contributor Author

gagern commented Apr 10, 2015

“LUT” = LookUpTable, the one which translates characters back to byte values.

@@ -1385,7 +1386,7 @@ try:

if memory_init_file:
if shared.Settings.USE_TYPED_ARRAYS != 2:
if type(memory_init_file) == int: logging.warning('memory init file requires typed arrays mode 2')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping this typ echeck means that I'll now have greater chances of hitting this message. I guess I should introduce some other variable to distinguish a user-requested setting from an automatically chosen one. Or should I use extra values for that single variable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have actually deprecated non typed arrays mode 2 code, so that should never be hit

@chadaustin
Copy link
Collaborator

Hi @gagern. I am very happy that you are looking into this. We have wanted similar behavior for a long time.

However, I would very much prefer that the string be an as-is JavaScript string, such that the decoding loop is literally:

for (var i = 0; i < memInitStr.length; ++i) {
HEAP8[base+i] = memInitStr[i];
}

Large regions of contiguous zeroes could be elided.

This way the data will be trivially inspectable with human eyes, and my understanding (though prove me wrong) is, the code size hit after gzip is rather small.

@gagern
Copy link
Contributor Author

gagern commented Apr 29, 2015

@kripken: The default for -O2 is now back to --memory-init-file 1 so this thing can be evaluated first, and may become the default later on if it proves useful. One drawback is that now the test suite won't test this at all, right? Can you tell me how much test coverage you'd want for this? Is one case working with the new code enough, or do you want several tests compiled with --memory-init-file 2 as well? Can you give me some pointers about where to start adding such tests?

@chadaustin: The string is now literal, with no table lookup step, so you can read the memory content. (By the way, I notice that libc++ content contains a lot of full paths to source files, thus exposing the developer's directory structure. I wonder what to do about that.)

So far my things are untested, because I have trouble getting a recent fastcomp installed. Apart from emscripten-core/emscripten-fastcomp#85 it's trying to link against libedit, which I don't have installed. Working on this, may file a separate ticket if I can't resolve it.

I had to resolve quite a number of conflicts to keep up with recent development. Should I rebase my changes?

@waywardmonkeys
Copy link
Contributor

Between @kripken and I, it looks like you've probably had a lot of changes to keep up with.

It looks like this change would be a lot simpler if you:

  • Stored the data using the same variable name (memoryInitializer instead of memInitString)
  • If you want to have an option to flip between encodings, just add a new setting: MEMORY_INITIALIZER_ENCODING which can default to 1 (array) but have an option for 2 (string) or other things in the future. This avoids the confusing overloading of mem_init_file which ends up bloating the changes a good bit.

@kripken
Copy link
Member

kripken commented Apr 29, 2015

+1 to @waywardmonkeys suggestions

Also: yes, need to rebase. For tests, we should probably use this on a whole test mode, like asm2f (see end of test_core.py).

Now that this is a string without a table lookup, where is it in the numbers listed before (which line is it there)?

@kripken
Copy link
Member

kripken commented Apr 29, 2015

See memory init testing in runner.py (grep for memory-init-file), we could add something similar for when the string is used.

Btw, we have a few settings that disable the memory init file (like shared libraries). We might want to default them to this new mode, perhaps.

@gagern
Copy link
Contributor Author

gagern commented May 28, 2015

I rebased my changes, and hope this makes sense the way I did things. Running the tests will take some time yet, since I need to update all my toolchain first, where I'm encountering a bunch of problems, and it's pretty late in the day here.

I noticed that at some later point, we probably should attempt to exempt the string literal from the influence of the closure compiler, since it will re-encode the string to something which is longer again. The semantics stay the same, though, so this is no crucial change. And I haven't yet figured out how exactly the asm.js output gets protected from closure, to see whether we could duplicate that.

@gagern
Copy link
Contributor Author

gagern commented May 29, 2015

I've got some failing tests in a core asm2m run: test_python test_the_bullet test_files test_fnmatch test_strings test_webidl error or fail. I'm investigating the first of these.

Something appears to be post-processesing the string literal, turning byte escapes into unicode escapes but also turning octal escapes into their binary equivalent. I haven't figured out yet who's doing this. But it caused me to read section 7.8.4 of the ECMA 262 spec and find out that indeed almost any unicode character may be contained in a string literal, while on the other hand support for octal escapes is optional and even forbidden in strict mode. Does that mean that we should avoid escaping anything except for newlines and quotes? Even a two-byte UTF-8 sequence for +U0080 through +U00FF is shorter than the four-byte \x?? sequence. Can we rely on the input being UTF-8?

In any case, the problem why the test is that the hex-to-octal rewrite would also affect things like \\x01 i.e. where the backslash itself is part of the string content. This completely breaks the memory alignment. That will need fixing. But if we have to drop all support for octal escapes in any case, then that replacement will no longer be a problem either.

I think I'll push a change where I rely on the source being parsed as UTF-8. Let me know whether you think such an assumption acceptable. With that change applied, some of the tests that used to fail now pass, but some still fail. test_the_bullet complains about no input files; I've seen this before in #3328. test_files assumes that we should have a mem file, will need modification. Likewise test_fnmatch and test_strings assume something about the presence of a memory initializer, will need modification. test_webidl says No such file or directory: 'WebIDLGrammar.pkl', not sure yet why.

@waywardmonkeys
Copy link
Contributor

While you were away, I had tried implementing this as well. My implementation ended up going into a different part of the code, but it kept running into problems as well. There's something about this that is hard to crack!

@gagern
Copy link
Contributor Author

gagern commented May 29, 2015

@waywardmonkeys: Do you have your approach in some branch here on github? What kinds of problems did you encounter?

@gagern
Copy link
Contributor Author

gagern commented Jun 8, 2015

OK, I think I fixed the previously failing test cases. I dropped some test functionality in c333383, please have a look whether that's acceptable. I also had to merge 1.33.1 to avoid conflicts with current incoming due to #3266 (very interesting development there, by the way). Now I'm updating my emsdk to current incoming, in order to run tests once more. Again.

Once thing I've been thinking about, regarding the fact that the string literal now depends on the character encoding used for the source file: perhaps we should include some kind of checksum with the string literal, in order to at least detect any corruption due to wrong encoding there. Should I add some CRC32 or Adler32 or similar? Should the verification be made optional inside an #if ASSERTIONS block?

@kripken
Copy link
Member

kripken commented Jun 8, 2015

A checksum sounds like a very good idea, and yes, in an ASSERTIONS block.

When you say UTF8 above, is the requirement that the entire JS file - including --pre.js, js libraryes etc. that the user provided - must be UTF8, and not another encoding? If so, I'm not sure offhand how serious a problem it is, but it doesn't sound like something obviously ok to do.

@gagern
Copy link
Contributor Author

gagern commented Jun 8, 2015

What I mean is that the browser must decode it as UTF-8, probably because the HTTP server delivered it as such. If that doesn't sound good, we should go with \x?? escapes. If things get minified, the result will for some reason have unicode escapes in any case. I just manged to understand CRC32 well enough that I don't have to ship a table but can generate that on the fly. Working on this.

@gagern
Copy link
Contributor Author

gagern commented Jun 8, 2015

tests/runner.py asm2m only complains about test_the_bullet which it always does on my system. Looks good to me. What do you think?

@gagern
Copy link
Contributor Author

gagern commented Jun 9, 2015

I finally found the uglify call which explicitely passes ascii_only: true as an option to uglify-js. So the hex escapes I'm now using above are pretty much in line with that. if at some point we choose to handle either of these situations differently, allowing binary data in UTF-8 encoding or whatever, then it would make sense to handle both situations (with and without uglify) the same.

Since mishoo/UglifyJS@81f5efe uglify knows how to produce \x?? escapes instead of \u????, so we might want to update our dependency there, to keep post-uglify string literals reasonably short.

@gagern gagern mentioned this pull request Jun 9, 2015
memfile = target + '.mem'
shared.try_delete(memfile)
def repl(m):
# handle chunking of the memory initializer
s = m.groups(0)[0]
if len(s) == 0 and not shared.Settings.EMTERPRETIFY: return m.group(0) # emterpreter must have a mem init file; otherwise, don't emit 0-size ones
open(memfile, 'wb').write(''.join(map(lambda x: chr(int(x or '0')), s.split(','))))
membytes = [int(x or '0') for x in s.split(',')]
if not shared.Settings.EMTERPRETIFY:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this needed because the emterpreter used to append its data to the normal memory init file? that is no longer true, so this should be removable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was mostly because if the memory is all null bytes, then I do a return '', which I considered semantically equivalent to return m.group(0). Come to think of it, if it indeed is, then perhaps I should make the two lines agree. Can I use return '' in both cases? Anyway, the return m.group(0) is not used for EMTERPRETIFY, presumably because we need a mem file there, even if it's empty. If it is OK for the emterpreter to have trailing zeros stripped, then I can move that condition to the if not membytes case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, all the emterpreter stuff here (and around it) is all no longer needed (there is no interaction between the emterpreter and mem init file anymore). removing any special-casing for it should be fine, if that makes your changes simpler.

gagern added 2 commits June 15, 2015 22:46
This makes the resulting literals more independent from the character
encoding the environment assumes for the resulting file.
It requires slightly more memory, but large bytes are far less common than
small bytes (zero in particular), so the cost should not be too much.
If we want to, we can still make this optional later on.
* MEM_INIT_METHOD != 1 with --with-memory-file 1 now triggers an assertion
* Consistently return '' instead of m.group(0) if there is no initializer
* Strip trailing zeros for emterpreter as well
* Include crc32 in literal only if it gets verified
* Enable assertions for the asm2m test run in general
* Disable assertions for one test case, fnmatch, to cover that as well
* Include the asm2m run name in two lists of run modes
* Add browser test to verify all pairs of bytes get encoded correctly
* Add browser test to verify that a >32M initializer works without chunking
* Omit duplicate var declaration for the memoryInitializer variable
* Minor comments and syntax improvements
* Capture the memory_init_file setting by its MEM_INIT_METHOD value.
* Drop special handling for emterpreter, which shouldn't be needed any more.
@gagern
Copy link
Contributor Author

gagern commented Jun 15, 2015

I've squashed some commits, but kept others apart where I considered the extra history valuable. And I verified that my final result is equivalent to the 4db4ee6 you reviewed. I'd say everything should be ready for merge now.

@chadaustin
Copy link
Collaborator

I am so so so happy this is happening. Thanks for all the hard work.

@kripken kripken merged commit f5bc422 into emscripten-core:incoming Jun 16, 2015
@kripken
Copy link
Member

kripken commented Jun 16, 2015

Merged. Thanks!

I also pushed a commit to refactor the crc code into tools/shared.py.

@gagern gagern deleted the strMemInit branch June 16, 2015 05:08
@gagern
Copy link
Contributor Author

gagern commented Jun 16, 2015

Thanks for the cooperation, I'm looking forward to using emscripten with this new improvement. Do you have any idea how to gather feedback on whether this should become the default for some settings?

@kripken
Copy link
Member

kripken commented Jun 16, 2015

Opening a thread on the mailing list might be a good start, asking people to test it out.

Meanwhile I ran some fuzzing on this last night, and found no issues.

@kripken
Copy link
Member

kripken commented Jul 2, 2015

It looks like there is windows-only breakage on the bots due to this. For example asm2.test_align_moar emits this: https://dl.dropboxusercontent.com/u/40949268/emcc/bugs/tmp_test_align_moar.js . See the end, it looks like the mem init is not escaped properly perhaps.

@juj
Copy link
Collaborator

juj commented Jul 2, 2015

On OS X, I see that when looking at the output of asm2m.test_align_moar in the emscripten_temp/ directory (run with EM_SAVE_DIR=1 EM_BUILD_VERBOSE=3 python tests/runner.py asm2m.test_align_moar), that the generated .js file contains some null bytes in the middle of the file. I think we shouldn't put any null bytes in the file, and also I wonder how bytes 1-31 work.

Although I think the issue of the test crashing optimizer.exe on Windows is not in the null bytes, but that somehow python is computing the file write bad, and it's outputting/truncating the rest of the file somehow, since nothing gets printed to the file after the memory initializer is outputted.

Was the design of the string memory initializer to depend on outputting bytes in the range 0-31 raw to the generated .js file? Or should those have gotten escaped somehow?

@gagern
Copy link
Contributor Author

gagern commented Jul 2, 2015

Was the design of the string memory initializer to depend on outputting bytes in the range 0-31 raw to the generated .js file? Or should those have gotten escaped somehow?

Originally I had intended to escape control bytes, then I noticed that our version of uglify would unescape those, read in the spec that almost any code point is valid in a string literal, including nulls, and decided to go with that myself.

Should I write a pull request to always escape control bytes, and we investigate relaxing that later on? Or should we investigate now until we understand the problem, then fix just that and hope all other platforms are fine?

Although I think the issue of the test crashing optimizer.exe on Windows is not in the null bytes, but that somehow python is computing the file write bad, and it's outputting/truncating the rest of the file somehow, since nothing gets printed to the file after the memory initializer is outputted.

Perhaps Python should open the file in binary mode, i.e. 'wb' instead of 'w' as an argument? I thought that would only affect line terminator translation, but I seem to recall that windows has an EOF control byte, and perhaps that's what's interfering here. It might be worthwhile to just try writing all possible bytes in python, see whether that triggers the problem.

open('foo.bin', 'w').write(''.join(map(chr,range(256))))

Unfortunately I don't have a Windows at hand just now, so can someone on Windows please give the above a try and report back? From reading Wikipedia I'd guess 26 or 4 to be the most likely bytes to cause problems here.

@juj
Copy link
Collaborator

juj commented Jul 3, 2015

I'm now seeing one crossplatform difference, that causes the breakage. If one runs the following python program

f = open('foo.txt', 'w')
f.write('hello text file!!!\n')
f.write(''.join(map(chr,range(256))))
f.write('good bye text file!!!\n')
f.close()

contents = open('foo.txt', 'r').read()
print contents

on Windows, the second read will truncated on what looks like the byte 0x19 (end of medium) or 0x20 (substitute), and the application prints

hello text file!!!
 ☺☻♥♦
♫☼►◄↕‼¶§▬↨↑↓

but on OS X, the read will complete past each byte, and the application prints

hello text file!!!
 ☺☻♥♦
♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]
^_`abcdefghijklmnopqrstuvwxyz{|}~⌂ÇüéâäàåçêëèïîìÄÅÉæÆôöòûùÿÖÜ¢£¥₧ƒáíóúñѪº¿⌐¬½¼¡
«»░▒▓│┤╡╢╖╕╣║╗╝╜╛┐└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╫╪┘┌█▄▌▐▀αßΓπΣσµτΦΘΩδ∞φε∩≡±≥≤⌠⌡÷≈°∙·√ⁿ²
■ good bye text file!!!

The effect is that writing the memory string initializer succeeds correctly, but afterwards reading it back with open(jsfile).read() or open(jsfile, 'r').read() truncates the read data at the first seen null byte. On Windows, open(jsfile, 'rb').read() does properly read all the bytes.

However even if I go in places and replace 'r' with 'rb', I am still getting corrupted output in the final executable. I am not sure if I found all the relevant places to transform, or if I missed a place. The output does change from optimizer.exe crashing, to another kind of corruption where the end of the memory initializer reads \u00c0�=(function(global,env,buffer) {, i.e. it is missing the terminating " character for the string, and the subsequent code is joined directly into it, generating an invalid .js file. Unfortunately I'm running out of time here to debug this, but I believe the root cause of these issues is with crossplatform inconsistencies in ascii text file handling.

@kripken
Copy link
Member

kripken commented Jul 3, 2015

I don't think there is a rush to fix this, in that the breakage is only on the string initializer mode, which is off by default. The only thing this blocks is us moving to that mode by default. But since that would be good to do, this is important to fix, I'm just saying I think we can wait for a proper fix, no need for an interim workaround.

It does sound like binary/text issues are in play here. Possibly 'rb'/'wb' in the right places would fix it, as you two suggest. I don't have a windows machine either, but perhaps someone else reading this does. We could also ask on irc or the mailing list. I'd also be ok to merge a pull that has those fixes, and we'll see what happens on the windows bot.

Side note, I've been wondering why the new test_align_moar triggered this issue, when it wasn't seen earlier. I realized that there is something special about it - the memory init file begins with a bunch of zeros, around 8, because it has to add them as padding to fix alignment. It's possible not many other tests have such a run of zeros, although that is surprising.

@gagern
Copy link
Contributor Author

gagern commented Jul 3, 2015

the application prints

hello text file!!!
 ☺☻♥♦
♫☼►◄↕‼¶§▬↨↑↓

OK, what we are seeing here is apparently some raw bytes decoded according to code page 437 or something very similar. We have bytes 00 through 04 in the first row. Not sure where the rest of that line went, but the next line starts with 0E, so I'd say 0D got interpreted as a carriage return without line feed, so stuff between the beginning of the file and that position got overwritten. Then we have bytes 0E through 19 in the second row, but no 1A. This is consistent with 1A denoting end of file in text mode. So my hunch there was correct.

The output does change from optimizer.exe crashing, to another kind of corruption where the end of the memory initializer reads \u00c0�=(function(global,env,buffer) {, i.e. it is missing the terminating " character for the string, and the subsequent code is joined directly into it, generating an invalid .js file.

I have no idea yet where that may come from.

I don't have a windows machine either, but perhaps someone else reading this does.

I do have a VM with Windows. It's painfully slow, and I have nearly no useful tools installed there, but if everything else fails, I guess I can set up an emsdk there.

Side note, I've been wondering why the new test_align_moar triggered this issue, when it wasn't seen earlier.

That is strange indeed. As I said, I guess it's not the null bytes, but the 1A bytes instead. But test_meminit_pairs at the very least should have had that as well. Could it be that that test case does not cover as much as we expect it to cover? One more reason to investigate the issue properly, instead of just working around the symptoms.

@kripken
Copy link
Member

kripken commented Jul 16, 2015

I'm going to disable the asm2m tests for now, until we resolve this. It breaks the windows bot, and might be hiding other breakage as development continues on incoming.

@gagern
Copy link
Contributor Author

gagern commented Jul 16, 2015

If you want to, you can try the following change instead:

-s = re.sub('[\x80-\xff]', escape, s)
+s = re.sub('\x00(?![0-9])', '\\0', s)
+s = re.sub('[\x00-\x1f\x7f-\xff]', escape, s)

Can't write a full pull request just now. The idea is to turn any null byte which doesn't precede a digit into \0 and escape control characters along with the non-7bit-ASCII bytes.

@kripken
Copy link
Member

kripken commented Jul 16, 2015

Thanks, I don't have time either currently, though (no windows machine, and busy with the syscalls landing).

kripken added a commit that referenced this pull request Jul 17, 2015
@aidanhs
Copy link
Contributor

aidanhs commented Jul 21, 2015

However even if I go in places and replace 'r' with 'rb', I am still getting corrupted output in the final executable. I am not sure if I found all the relevant places to transform, or if I missed a place. The output does change from optimizer.exe crashing, to another kind of corruption where the end of the memory initializer reads \u00c0�=(function(global,env,buffer) {, i.e. it is missing the terminating " character for the string, and the subsequent code is joined directly into it, generating an invalid .js file. Unfortunately I'm running out of time here to debug this, but I believe the root cause of these issues is with crossplatform inconsistencies in ascii text file handling.

Emscripten really should use 'rb' everywhere anyway.

@waywardmonkeys
Copy link
Contributor

Emscripten really should use 'rb' everywhere anyway.

👍

@kripken
Copy link
Member

kripken commented Jul 21, 2015

If we can confirm that fixes this issue, and has no downsides (I don't know windows enough to tell...), sounds good to me.

@juj juj mentioned this pull request Oct 15, 2015
gagern added a commit to gagern/emscripten that referenced this pull request Oct 13, 2016
…e#3326"

This reverts commit b256af6.
Since emscripten-core#3854 started escaping \x1a, this should no longer be needed.
Conflicts due to code reformatting had to be resolved manually.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants