Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dangling pointer bug in uzlib_deflate.c causes lua.cross to sometimes crash #2632

Closed
ziggurat29 opened this issue Jan 29, 2019 · 9 comments
Closed

Comments

@ziggurat29
Copy link
Contributor

Expected behavior

not crash, generate a flash image

Actual behavior

crashes, leaving a 0-length flash image file

Test code

lua.cross.exe -f -m 0x20000 -o lfs.img _init.lua dummy_strings.lua inspect.lua kosbo_lfs.lua

I have already debugged the code and have a resolution posted below in the Details section

NodeMCU version

master
HEAD: 1159295
Merge pull request #2582 from nodemcu/dev
Arnim Läuger
Friday, December 07, 2018 3:47:49 PM

Hardware

I am using a NodeMCU-dev board, rev 0.9, but that is not relevant for lua.cross

Details

app\uzlib\uzlib_deflate.c:571, a buffer is freed:
FREE(dynamicTables);

However, there are pointers into this buffer that are left dangling. Specifically, oBuf, which is used at line 575.

You can see the memory layout as it was allocated at line 221, set into dynamicTables at line 217, and where oBuf was set to point within that allocated block at line 224.

To resolve this, I would move the FREE at line 571 to after the control block at 574-582. (i.e., just before returning from the function.)

Since the Uzlib seems to be third party code, it might be useful to see if it was already fixed upstream.

This bug may be unnoticed because the freed memory block is being accessed quickly after freeing, and it's contents have likely not been damaged yet. However it was crashing reliably for me in debug builds, because the debug malloc on MSVC blitzes freed blocks with a pattern. In release builds, it usually ran OK, but occasionally did not.

@TerryE
Copy link
Collaborator

TerryE commented Feb 1, 2019

@ziggurat29 thanks for this but i am bit confused by your explanation. The struct dynTables *dynamicTables is used to contain the deflate huffman decode tables and constants. The malloc at L215 allocates space for it and the dt->hashTable and dt->hashChain vectors and the storage is as you say freed at L571, but none of these dynamic tables are used between L572 and the return at L584; specifically oBuf->buffer is a separate output byte array, so the order of these frees (or re-ordering them) should not change the execution.

If you are getting crashes as you suggest then IMO a more probable underlying issue is that I am walking of the end of tables and invalidly stomping out of scope RAM. Though I also note that you say that you are using MSVC which isn't supported for firmware builds, so are you using uzlib for your own application? We got no issues with this as the code is released under an open-source licence, but this does also add other possible failure modes sich as the out-of-bounds issue being in your application code.

Incidentally pieces of the code are taken from or based on the attributed sources, but a large part is a fresh implementation to work within ESP8266 Ram constraints

@HHHartmann
Copy link
Member

@TerryE Terry from what I can see @ziggurat29 is has ported lua.cross to native windows without mingw.
I would very much like to get a copy of that if not the project file to be able to build my own version.

I looked at the code myself and found that
in line 216 the memory for oBuf and dynTables is allocated together with dynTables hashChain and hashTable.

in line 572 it is freed altogether again.
oBuf is the accessed below as stated by @ziggurat29 .

@ziggurat29 does it work if you move the line as you suggested?
Maybe create a pull request if it works. If you have problems doing so I will assist you.

@TerryE
Copy link
Collaborator

TerryE commented Feb 2, 2019

@TerryE Terry from what I can see @ziggurat29 is has ported lua.cross to native windows without mingw.

Using Cygwin was trivial. Supporting mingw would involve a bit of work but is doable and would produce a native windows executable. Using native MSVC is quite a lot more work because of the intrinsic incompatibilities resulting from the lack of decent POSIX support and needing to use the native Windows APIs. Not for the faint-hearted, IMO.

@ziggurat29
Copy link
Contributor Author

ziggurat29 commented Feb 10, 2019

I apologize for not having visited this issue for the past week, and for responding so late. My recollection of the problem has since been paged out, but I think I can do a more detailed walk-though.

Looking at the code, you can see that at about line 224:

  oBuf          = (struct outputBuf *)(dt+1);

so, oBuf is pointing /into/ the dt chunk. Also note, dt is an alias of dynamicTables.

Later, around line 571 we do:

  FREE(dynamicTables);

This obviously releases the entire malloc'd block that was assigned to dynamicTables, but recall that single block includes several contiguous objects, especially oBuf which is pointing into that block -- it's not a separately allocated object. So when dynamicTables is freed, oBuf is implicitly freed as well. So now it is now pointing into a free'ed memory block. If no one touches oBuf anymore it's not a problem, but we do touch it several times right away. E.g. a few lines later:

    uchar *trimBuf = realloc(oBuf->buffer, oBuf->len);

(and other places).
If the free'ed block has not been altered, then those things will still work (even though oBuf refers to freed memory), but if the block /has/ been altered, then there will be problems because things like

  oBuf->buffer and oBuf->len

will contain junk.

On MSVC, if you do a debug build, then all the mallocs and frees cause the memory to be filled with a pattern. This is to make these sorts of issues more obvious (it really is helpful). In the case of MSVC, a free() will cause the memory to first be overwritten with 0xdd before doing the actual free(). In the case of the code mentioend, that will cause, say,

    uchar *trimBuf = realloc(oBuf->buffer, oBuf->len);

to resolve to:

  uchar *trimBuf = realloc(0xdddddddd, 0xdddddddd);

and so crashing ensues.

If you need it, I can revert my change and stimulate a crash and give you the stack trace, but I think it is already apparent that all the cases of oBuf being used after the FREE(dynamicTables) are instances of accessing memory after it has been free'ed, and that moving the FREE(dynamicTables) until /after/ those statements have executed will resolve the issue.

@HHHartmann , yes, I built it with MSVC. You are welcome to the modified project if you want it. Aside from setting up the .vcxproj file, I did have to modify some source for all the compiler-specific stuff; e.g. changing __attribute() to functionally equivalent __declspec() stuff, and a little linker legerdemain to produce a working IN_RODATA_AREA(). It really wasn't too bad, but it did involve maybe 5 source files. Presently, these changes are not wrapped in #ifdefs as I normally would if I were to submit the changes for inclusion in the main codebase, but I think those sort of improvements could be done with ease.

At any rate, the changes needed for MSVC support of luac.cross wound up being 'simple' largely because luac.cross itself happens to be a simple application. If it were a more complex application then it might have been more of a mess, but MSVC can handle things like printf, fopen, and longjmp just fine. Accommodating the linker for placement of the rotables was maybe the most tricky, but that's intrinsically a toolchain-specific concern and doesn't have anything to do with libc or OS's.

As @TerryE mentioned, there is now a Cygwin build option. I was unaware of that option until after I had gone through the effort of building on MSVC (lol, it pays to read more, but isn't that a halting problem?). However, this bug as filed is a fundamental one regardless of toolchain, runtime libs, or OS. I believe it's effect has been masked so far simply due to peculiarities of the current toolchain's libc implementation, hence why it became visible when I used a different one (from MSVC). The bug is deterministically fixed by moving the 'free' as I described in the original post.

@TerryE
Copy link
Collaborator

TerryE commented Feb 11, 2019

@ziggurat29, If you do an issues search (is:issue luac in:title), then you will see that I am very much aware of this gap, and there is an open issue on the stufff that you are offering: #2315.

Most of the active developers here do so in a Posix environment. In my case I used to be actively involved with MS tools and have been to Redmond a few times, but I stopped using MS stuff including WinX when I retired early from HP. I have an old laptop which still dual-boots into Win7, but I am not willing to pay for MSVC out of my own pocket, when I regard the free GNU toolchain as superior for my needs.

This all underlines to me that you are bringing what seems to be rare skills to the project -- being an active MS developer who is willing to work on and contribute to the project, so I welcome and will support your contribution.

What I ask is that you fork our repo on github and commit your version to your fork, so that I can review and integrate your suggested changes. Whilst it is too problematic adding Windows build support for the entire project, having this for the limited host-based tools such as luac and spiffsimg is highly desirable, IMO.

We can talk about this further by email (my mailbox using my userid at apache.org still works) or on our gitter thread.

@TerryE
Copy link
Collaborator

TerryE commented Feb 11, 2019

PS. I've been laid up for this last week with a horrible respiratory virus which makes me feel sh*t. My head hasn't been clear enough to work on the project, but I am on the mend now and will try to catch up.

@ziggurat29
Copy link
Contributor Author

ziggurat29 commented Feb 11, 2019

Okeydoke. I'll do the following:

  • I'll make a fork

  • set up one branch that is for this specific bug only. It's the proverbial 'one-line fix'. I think this has merit for inclusion on it's own because it is a fundamental bug that affects all toolchains. I'll generate a PR from that branch, and that PR should resolve this issue dangling pointer bug in uzlib_deflate.c causes lua.cross to sometimes crash #2632

  • I'll set up a separate branch for the MSVC stuff. I will improve beyond my current work with appropriate #ifdef around MSVC-specific stuff. I just did a diff, and it looks like there is minor surgery in nine files. I hadn't built spiffsimg yet, since it turned out that I didn't need to, but I'll go for that as well. That one looks even easier, anyway. I'll do a separate PR for that stuff.

That way we can get the bugfix for this issue in, and then separately if y'all think the MSVC support is too much, then you can just leave that in a branch or reject it or whatever.

My motivation for doing the msvc build is simply that the only other tool I needed was the luac for the LFS stuff. All the other stuff I can do through the online build system. Since I had msvc on-hand, and because I did not yet know about the mingw alternative, I decided to give it a go. It might be useful to someone else. Incidentally, for the curious: the free MS Visual Studio Community Edition is fully capable.

@ziggurat29 ziggurat29 mentioned this issue Feb 11, 2019
4 tasks
@ziggurat29
Copy link
Contributor Author

OK, PR #2661 is in to resolve this problem.

Also, I submitted PR #2665 that has all the MSVC stuff for luac.cross. I didn't know if it was appropriate for me to file an enhancement issue to associate with the PR, so I didn't. It is dependent on this fix being in-place, though.

spiffsimg turns out to be more work than expected, so I will submit that PR separately when it is working. I'm not sure why one needs spiffsimg as a user of nodemcu (luac.cross is critical for LFS usage), but I don't mind adapting it, too.

@TerryE
Copy link
Collaborator

TerryE commented Feb 12, 2019

I'm not sure why one needs spiffsimg as a user of nodemcu

If you want to "factory install" NodeMCU + Lua app, then using SPIFFSimg makes this simpler, as no tty interactive session is needed.

I'll take a look at the MS stuff tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants