Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peer_control: Fix a use-after-free bug. #1237

Merged

Conversation

ZmnSCPxj
Copy link
Contributor

Possibly related to #1235 and #1236, but not confirmed to fix that.

@rustyrussell
Copy link
Contributor

I don't see where we free channel here?

@ZmnSCPxj
Copy link
Contributor Author

We are freeing error at the end of the function. In one path, error is loaded from channel->error. Since the function frees error, the channel->error becomes a pointer to a freed memory.

@cdecker
Copy link
Member

cdecker commented Mar 18, 2018

Excellent find, we are indeed leaving a dangling pointer in channel->error. Wondering however if a cleaner fix would be to allocate the error in here off of tmpctx and rely on the auto-free of that context:

error = peer_accept_channel(ld, id, addr, cs, gossip_index,
gfeatures, lfeatures,
peer_fd, gossip_fd, channel_id,
in_msg);

That would avoid duplicating the error.

@ZmnSCPxj ZmnSCPxj force-pushed the peer-nongossip-use-after-free branch from 72d4382 to 8d69f94 Compare March 18, 2018 22:52
@ZmnSCPxj
Copy link
Contributor Author

Allocated off in_msg the "Weird request" case instead, and removed the tal_free.

@ZmnSCPxj ZmnSCPxj force-pushed the peer-nongossip-use-after-free branch from 8d69f94 to 72d4382 Compare March 19, 2018 00:38
@ZmnSCPxj
Copy link
Contributor Author

ZmnSCPxj commented Mar 19, 2018

Allocating off in_msg fails CI https://travis-ci.org/ElementsProject/lightning/jobs/355149369, unfortunately I cannot get the details since the rerun breaks completely and I cannot extract the failure cause. Reverted to the previous "make a copy of channel->error" instead. The "allocate off in_msg" version is in my branch https://github.com/ZmnSCPxj/lightning/tree/other-peer-nongossip-use-after-free if anyone wants to take a look at it.

@rustyrussell
Copy link
Contributor

Here's the clue from that failure:
'result': {'leaks': [{'value': '0x9d95960', 'parents': ['wire/gen_peer_wire.c:608:u8[]', 'lightningd/lightningd.c:48:struct lightningd'], 'backtrace': ['ccan/ccan/tal/tal.c:467 (tal_alloc_)', 'ccan/ccan/tal/tal.c:496 (tal_alloc_arr_)', 'ccan/ccan/tal/str/str.c:90 (tal_vfmt)', 'ccan/ccan/tal/str/str.c:45 (tal_fmt)', 'common/wire_error.c:80 (sanitize_error)', 'lightningd/peer_control.c:469 (peer_sent_nongossip)', 'lightningd/gossip_control.c:59 (peer_nongossip)', 'lightningd/gossip_control.c:168 (gossip_msg)', 'lightningd/subd.c:504 (sd_msg_read)', 'lightningd/subd.c:330 (read_fds)', 'ccan/ccan/io/io.c:59 (next_plan)', 'ccan/ccan/io/io.c:387 (do_plan)', 'ccan/ccan/io/io.c:397 (io_ready)', 'ccan/ccan/io/poll.c:310 (io_loop)', 'lightningd/lightningd.c:400 (main)'], 'label': 'ccan/ccan/tal/str/str.c:90:char[]'}, {'value': '0x9d76378', 'parents': ['wire/gen_peer_wire.c:608:u8[]', 'lightningd/lightningd.c:48:struct lightningd'], 'backtrace': ['ccan/ccan/tal/tal.c:467 (tal_alloc_)', 'ccan/ccan/tal/tal.c:496 (tal_alloc_arr_)', 'wire/gen_peer_wire.c:78 (fromwire_error)', 'common/wire_error.c:60 (sanitize_error)', 'lightningd/peer_control.c:469 (peer_sent_nongossip)', 'lightningd/gossip_control.c:59 (peer_nongossip)', 'lightningd/gossip_control.c:168 (gossip_msg)', 'lightningd/subd.c:504 (sd_msg_read)', 'lightningd/subd.c:330 (read_fds)', 'ccan/ccan/io/io.c:59 (next_plan)', 'ccan/ccan/io/io.c:387 (do_plan)', 'ccan/ccan/io/io.c:397 (io_ready)', 'ccan/ccan/io/poll.c:310 (io_loop)', 'lightningd/lightningd.c:400 (main)'], 'label': 'wire/gen_peer_wire.c:78:u8[]'}, {'value': '0x9d95700', 'parents': ['wire/gen_peer_wire.c:608:u8[]', 'lightningd/lightningd.c:48:struct lightningd'], 'backtrace': ['ccan/ccan/tal/tal.c:467 (tal_alloc_)', 'ccan/ccan/tal/tal.c:496 (tal_alloc_arr_)', 'common/utils.c:11 (tal_hexstr)', 'wire/fromwire.c:229 (fmt_channel_id_)', 'common/type_to_string.c:35 (type_to_string_)', 'common/wire_error.c:80 (sanitize_error)', 'lightningd/peer_control.c:469 (peer_sent_nongossip)', 'lightningd/gossip_control.c:59 (peer_nongossip)', 'lightningd/gossip_control.c:168 (gossip_msg)', 'lightningd/subd.c:504 (sd_msg_read)', 'lightningd/subd.c:330 (read_fds)', 'ccan/ccan/io/io.c:59 (next_plan)', 'ccan/ccan/io/io.c:387 (do_plan)', 'ccan/ccan/io/io.c:397 (io_ready)', 'ccan/ccan/io/poll.c:310 (io_loop)', 'lightningd/lightningd.c:400 (main)'], 'label': 'common/utils.c:11:char[]'}]}}

Basically, we call sanitize_error() and allocate that off error which is actually not freed any more.

Christian's advice of using tmpctx here is correct: one subtle advantage of tal() is that you can have two paths, one of which uses a temporary and one which doesn't, and simply free the parent and not worry about distinguishing the two. In this case, tmpctx is the correct parent.

This bug is a classic case of being lazy:
1. peer_accept_channel() allocated its return off the input message,
   rather than taking an explicit allocation context.  This concealed the
   lifetime nature of the return.
2. The context for sanitize_error was the error itself, rather than the
   more obvious tmpctx (connect_failed does not take).

The global tmpctx removes the "efficiency" excuse for grabbing a random
object to use as context, and is also nice and explicit.
@rustyrussell
Copy link
Contributor

How's that, @ZmnSCPxj ?

@ZmnSCPxj
Copy link
Contributor Author

Looks OK.

@practicalswift
Copy link
Contributor

Oh, very nice find!

@rustyrussell rustyrussell merged commit 044705a into ElementsProject:master Mar 19, 2018
@ZmnSCPxj ZmnSCPxj deleted the peer-nongossip-use-after-free branch March 19, 2018 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants