Skip to content
This repository has been archived by the owner on Jun 30, 2023. It is now read-only.

[RFC] corrupted msgpack streams are printed to screen and/or saved #38

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

phodge
Copy link

@phodge phodge commented Jun 25, 2018

Forgive this novice Lua code ... this is my attempt at getting the lua client to print corrupted msgpack streams to screen. The entire blob is also saved to disk to aid local debugging. Related to neovim issue neovim/neovim#8102, and neovim PR neovim/neovim#8638

@phodge
Copy link
Author

phodge commented Jun 25, 2018

It seems the pcall ... unpack() logic isn't quite right. I'm getting an error in the nvim unit tests because unpack(rv) somehow returns all nil values.

local a, b, c, d, e = self._session:receive(data, pos)
return {a, b, c, d, e}
end
local status, rv = pcall(helper)
Copy link
Member

@justinmk justinmk Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this:

status, type, id_or_cb, method_or_error, args_or_result, pos = 
  pcall(self._session:receive, data, pos)

id_or_cb, method_or_error, args_or_result, pos will all be nil if the call fails. Otherwise they'll have the return values.

type will be the error message on failure, otherwise it will have the type return value.

badfile:write(data)
badfile:close()

-- build a printable representation of the bad part of the string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/helpers.lua has a hexdump() function. Might be good enough to paste it into lua-client and use it here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I'll copy-paste hexdump into the file. BTW is there a more reasonable place for me to save the full data contents (instead of ./msgpack-fail?), and is there an easy way I can turn that behaviour off in CI where we'll never be able to get at the saved data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a more reasonable place for me to save the full data contents (instead of ./msgpack-fail?), a

What's wrong with that? This situation should almost never happen. So it doesn't matter for CI or elsewhere.

@@ -1,8 +1,8 @@
package = 'nvim-client'
version = '0.1.0-1'
source = {
url = 'https://github.com/neovim/lua-client/archive/' .. version .. '.tar.gz',
dir = 'lua-client-' .. version,
url = 'git://github.com/phodge/lua-client.git',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this should not be the .tar.gz form? I don't know much about luarocks , I'm just basing this on the old code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this form on one of the luarocks wiki pages, but it'll have to get reverted before I merge my branch anyway.

@phodge
Copy link
Author

phodge commented Jun 28, 2018

@justinmk: frustratingly my mac is passing all the tests again today so I've had to split my focus onto some of the other issues while I wait for Travis to rerun the OSX tests for me. However the good news is - I've added some additional debug output to the lua-client and it now looks like the OSX bug is caused by some extra data coming in on the end of a "redraw" RPC notification. (See https://api.travis-ci.org/v3/job/397526419/log.txt - search for "invalid msgpack" and scroll up a little).

I'd like to get your opinion on the debug output and implementation in this PR - if there's anything you'd like changed before this gets merged in.

@justinmk
Copy link
Member

It now looks like the OSX bug is caused by some extra data coming in on the end of a "redraw" RPC notification

Are you sure it's extra data? Or just malformed somehow? I can't tell from looking at the bytes:

Error deserialising msgpack data stream at pos 1:
0000: 91 a1 72 91 a1 65 92 ab ..r..e..
0008: 63 75 72 73 6f 72 5f 67 cursor_g
0010: 6f 74 6f 92 03 45 93 a3 oto..E..
0018: 70 75 74 91 a1 65 91 a1 put..e..
0020: 75 92 ab 63 75 72 73 6f u..curso
0028: 72 5f 67 6f 74 6f 92 03 r_goto..
0030: 48 97 a3 70 75 74 91 a1 H..put..
0038: 66 91 a1 75 91 a1 67 91 f..u..g.
0040: a1 69 91 a1 61 91 a1 74 .i..a..t
0048: 92 ab 63 75 72 73 6f 72 ..cursor
0050: 5f                      _

Now we need a way to make that human-readable :) (is that what your old code did...?)

@phodge
Copy link
Author

phodge commented Jun 28, 2018

I'm 55% certain it's extra data, because if you look at the lua mpack C code the error gets raised when the session receive function realises that the value it just unpacked isn't a notification, a request, or a response.

As for human-readable, I guess that depends what you prefer. I liked the previous solution of using \xXX hex codes just for the unprintable characters (like what python __repr_() does), would you like me to put that code back?

@justinmk
Copy link
Member

\xXX hex codes just for the unprintable characters (like what python _repr() does), would you like me to put that code back?

Well, the table above has hex codes... but feel free to add the other logic, then keep and show both representations. Don't need to be tidy now.

@jamessan
Copy link
Member

jamessan commented Jun 30, 2018

If data is the raw RPC stream, then the hexdump looks fairly normal to me. A single Msgpack RPC message should have one of three forms:

  • Request
    • [type, msgid, method, params]94 (4-element array) 00 (fixint 0) ...
  • Reply
    • [type, msgid, error, result]94 (4-element array) 01 (fixint 1) ...
  • Notification
    • [type, method, params]93 (3-element array) 02 (fixint 2) ...

None of those patterns show up in the dump, so we're in the middle of a single message.

91 a1 72[ 'r' ]
91 a1 64[ 'e' ]
92 ab 63 75 72 73 6f 72 5f 67 6f 74 6f 92 03 45[ 'cursor_goto', [ 3, 69 ] ]
93 a3 70 75 74 91 a1 65 91 a1 75[ 'put', ['e'], ['u'] ]
92 ab 63 75 72 73 6f 72 5f 67 6f 74 6f 92 03 48[ 'cursor_goto', [ 3, 72 ] ]
97 a3 70 75 74 91 a1 66 91 a1 75 91 a1 67 91 a1 69 91 a1 61 91 a1 74[ 'put', [ 'f' ], [ 'u' ], [ 'g' ], [ 'i' ], [ 'a' ], [ 't' ] ]
92 ab 63 75 72 73 6f 72 5f[ 'cursor_

As noted, the method from the previous _session:receive call is a "redraw".

The first two arrays are the tail end of a previous "put" event, and then we end with a partial "cursor_goto" event, all part of the same "redraw" that was in the previous chunk. (Edit: I see that we're only dumping the first 80 bytes of what was received in this call, so the final "cursor_goto" event may not be cut off.)

Taking a look at spell_spec.lua, we can see that one of the lines contains dolore eu fugiat. Everything there looks correct.

The question then is why the "redraw" notification is being broken up instead of handled as a single message.

@jamessan
Copy link
Member

jamessan commented Jun 30, 2018

Hmm, MsgpackRpcStream:read_start calls self._stream:read_start(function(data)...). That callback processes all of the data it is given, but data may be an incomplete message. Nothing handles the fact that the callback may get called again to feed more data.

It seems like we probably need to do two things.

  • Truncate data when a message has been processed (and adjust len accordingly)
  • When we exit the while pos <= len loop, store any remaining data on self for the next time read_start calls our callback. Otherwise, we won't be able to continue processing from where we left off

@phodge
Copy link
Author

phodge commented Jul 3, 2018

@jamessan: Thanks for the extra guidance, I've managed to reproduce the exact error on linux now. It happens when read_start() gets the data split into at least two parts, where the first part is exactly 8192 bytes.

FYI the mpack.Session instance is responsible for storing the incomplete data internally, so there's no reason to handle it inside msgpack_rpc_stream.lua. But for some reason, when mpack.Session.receive() gets the first 8192 bytes of the redraw notification, it returns an incomplete object rather than recognising it only has part of the data and waiting for more to arrive.

@phodge phodge force-pushed the always-dump-corrupted-data branch from 38b0fb7 to e343477 Compare July 9, 2018 06:22
@phodge
Copy link
Author

phodge commented Jul 9, 2018

@justinmk: I've rebased this branch to tidy up the commits and remove the rockspec hackery. It should be good to merge.

@phodge phodge force-pushed the always-dump-corrupted-data branch from 0a472e6 to 3dc2c48 Compare July 9, 2018 06:41
@tarruda tarruda self-requested a review August 27, 2018 23:42
Copy link
Member

@tarruda tarruda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI the mpack.Session instance is responsible for storing the incomplete data internally, so there's no reason to handle it inside msgpack_rpc_stream.lua. But for some reason, when mpack.Session.receive() gets the first 8192 bytes of the redraw notification, it returns an incomplete object rather than recognising it only has part of the data and waiting for more to arrive.

@phodge if that's the case, then it is probably a bug in libmpack and if so must be fixed there. Can you save the blob causing this error so I can write a test in libmpack?

@phodge
Copy link
Author

phodge commented Aug 27, 2018

Hi @tarruda: I can't rerun the test right now, but you should still be able to apply the patch in neovim/neovim#8102 (comment) to reproduce.

@phodge
Copy link
Author

phodge commented Aug 27, 2018

Also, is there anything else I need to fix on this PR before it can be merged?

@tarruda
Copy link
Member

tarruda commented Aug 28, 2018

I can't rerun the test right now, but you should still be able to apply the patch in neovim/neovim#8102 (comment) to reproduce.

So all I have to do is apply that patch and run neovim tests locally? Is there a specific test which is more likely to trigger?

Also, is there anything else I need to fix on this PR before it can be merged?

No, LGTM

@phodge
Copy link
Author

phodge commented Aug 28, 2018

@tarruda: IIRC it was test/functional/ui/spell_spec.lua that reproduces the bug, but if it's not then you should still encounter it by just running all the tests.

@jamessan
Copy link
Member

Also, is there anything else I need to fix on this PR before it can be merged?

Was this intended to be merged as-is? I thought it was just being used for debugging. I'm not sure we should be printing the debug messages and writing a file to disk when we get an incomplete message.

@tarruda
Copy link
Member

tarruda commented Aug 28, 2018

Was this intended to be merged as-is? I thought it was just being used for debugging. I'm not sure we should be printing the debug messages and writing a file to disk when we get an incomplete message.

Agreed, I thought this was going to be merged in order to obtain the blobs from the tests, but I guess we can simply change neovim build system to download from @phodge 's branch

@justinmk
Copy link
Member

We do want to merge this in some form, so that it works for every PR, automatically. Writing a hint file maybe is the only way.

@phodge
Copy link
Author

phodge commented Aug 28, 2018 via email

@tarruda
Copy link
Member

tarruda commented Aug 28, 2018

@justinmk applying your patch actually results in an invalid msgpack-rpc payload being sent

After applying @phodge patch and running while true ; do TEST_FILE=test/functional/ui/spell_spec.lua make functionaltest || break ; done for more than 1 hour, I still couldn't reproduce on master.

@phodge can you reproduce the error easily in your computer? If so and you manage to save it in a file, please post it somewhere so I can debug libmpack locally.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants