Segfault when parsing with `TOML_EXCEPTIONS=0` #65

sneves · 2020-10-09T19:44:01Z

A very simple case, once again: parsing "#\xf1\x63" when disabling exceptions. The crash occurs in parser::parse_document when trying to continue parsing after reader.read_next() returns nullptr.

I'm not entirely sure why it happens; it looks like consume_comment should return false on advance_and_return_if_error({}), but somehow it does return true instead.

The text was updated successfully, but these errors were encountered:

marzer · 2020-10-09T19:55:22Z

Interesting. The test suite does run builds with TOML_EXCEPTIONS=0, so I need more information. Environment, compiler flags, etc., if you have them handy. (the bug report template does ask for these things)

I'll investigate on my machine all the same, but every bit helps.

marzer · 2020-10-09T20:00:16Z

Oh, you know what, scratch that! I see what the issue is immediately. The core parse loop doesn't account for the possibility of a comment or malformed UTF-8 right at EOF. Should be easy to fix, thanks.

sneves · 2020-10-09T21:00:28Z

Another input resulting in a null pointer dereference: "1= 0x6cA#+\xf1". This one fails when setting the error string, in to_sv(*cp) with a null cp.

I should mentioned that this happens after applying the 6255dd7 fix.

marzer · 2020-10-09T21:01:07Z

Fixed, thanks @sneves!

Since you seem to enjoy digging a little bit, I'll explain what was actually happening here.

The string contained a malformed utf8 byte sequence so the stream reader's read_next() captured an error and returned a null codepoint
consume_comment() was detecting the error state and returning false (which is the intended behaviour; an error while consuming something is a failure to consume it)
parse_document() was assuming the only false results here meant "nothing to consume", not taking into account the possibility for an error at this level.

Adding a single return_if_error_or_eof() to parse_document() was sufficient to handle the error propagation.

Worth noting that all this complexity evaporates if the library is built with exceptions, so if you can use them I highly recommend it (especially since all the explicit error-handling code dissapearing means the parser is a decent bit faster).

There's now a unit test to catch this, too:

tomlplusplus/tests/user_feedback.cpp

Lines 91 to 102 in 6255dd7

    
           SECTION("feedback - github/issues/65") 
        
           { 
        
           	// see: https://github.com/marzer/tomlplusplus/issues/65 
        
           	// this tests two things: 
        
           	// - a comment at EOF 
        
           	// - a malformed UTF-8 sequence 
        
           	//  
        
           	// it should fail to parse, but correctly issue an error (not crash!) 
        
           	parsing_should_fail(FILE_LINE_ARGS, "#\xf1\x63"); 
        
           }

marzer · 2020-10-09T21:01:40Z

Another input resulting in a null pointer dereference: "1= 0x6cA#+\xf1". This one fails when setting the error string, in to_sv(*cp) with a null cp.

Yup, that will be the same issue. \xf1 is in-and-of-itself not a complete utf8 byte sequence.

marzer · 2020-10-09T21:03:43Z

I should mentioned that this happens after applying the 6255dd7 fix.

Oh 😓

marzer · 2020-10-09T21:06:44Z

These are pretty weird inputs @sneves; out of curiosity, where are you getting them from? Are you just making them up to stress-test the library (which I appreciate!), or is there some example garbage TOML you're plucking it from?

marzer · 2020-10-09T21:33:09Z

Alright, should be good now. I also audited the rest of the to_sv(*cp) callsites for the possibility of cp being nullptr in the event of a stream error, and added a null-safe version instead.

This is good stuff, keep it coming!

sneves · 2020-10-09T21:39:37Z

These are inputs from fuzzing; they're not "natural" inputs.

marzer · 2020-10-09T21:47:03Z

Ah, makes sense. Fuzzing has been on my to-do list for a while; next time I get a batch of more serious toml++ time (beyond basic bugfixes) I'd like to add a fuzzer to the CI.

sneves · 2020-10-09T22:36:33Z

Against the latest master, the following input crashes in consume_comment: "p=06:06:06#\x0b\xff".

I managed to fix it by adding return_if_error({}); after if (consume_line_break()) return true; but no idea if that's a proper fix.

marzer · 2020-10-09T22:42:43Z

Ah, thanks. I'm heading out for the night but will have a look into this tomorrow (that and any other broken inputs you find).

sneves · 2020-10-10T00:49:57Z

A couple more cases:

"''''d' 't' '+o\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\r\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0cop1\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c\x0c' 'ml'\n\n%\x87" causes an infinite loop in parse_literal_string. This is actually caught with a debug assert, so I fixed it as

-                                       assert_not_error();
+                                       return_if_error({});

"bwit = [ 9,2, 3, "four", 5.0 ]\n t =[ 9, 2, 1,"r", 9999999999999999999999999999999999999999999999999999999999999995.0 ]" results in a stack overflow in parse_float, with chars[64] getting written to. I fixed this by moving the length check to immediately before the writing:

@@ -9911,10 +9912,6 @@ TOML_IMPL_NAMESPACE_START
                                                else if (!is_match(*prev, U'e', U'E'))
                                                        set_error_and_return_default("expected exponent digit, saw '"sv, to_sv(*cp), "'"sv);
                                        }
-                                       else if (length == sizeof(chars))
-                                               set_error_and_return_default(
-                                                       "exceeds maximum length of "sv, static_cast<uint64_t>(sizeof(chars)), " characters"sv
-                                               );
                                        else if (is_decimal_digit(*cp))
                                        {
                                                if (!seen_decimal)
@@ -9928,6 +9925,11 @@ TOML_IMPL_NAMESPACE_START
                                        else
                                                set_error_and_return_default("expected decimal digit, saw '"sv, to_sv(*cp), "'"sv);
 
+                                       if (length == sizeof(chars))
+                                               set_error_and_return_default(
+                                                       "exceeds maximum length of "sv, static_cast<uint64_t>(sizeof(chars)), " characters"sv
+                                               );
+
                                        chars[length++] = static_cast<char>(cp->bytes[0]);
                                        prev = cp;
                                        advance_and_return_if_error({});

Other than this, nothing's popped up for quite a while, so I think the low-hanging fruit could be over..

PS: As before, I can't swear by the correctness of these fixes, only that they made the immediate problem go away.

also a few other edge cases (see #65)

marzer · 2020-10-10T08:46:14Z

Fantastic, thanks @sneves! I've applied fixes for these in master.

As before, I can't swear by the correctness of these fixes, only that they made the immediate problem go away.

Your fixes are sound. The parser is deliberately procedural and "C-like" so there's not a lot of hidden voodoo that goes into fixes and modifications.

sneves added the bug Something isn't working label Oct 9, 2020

sneves assigned marzer Oct 9, 2020

marzer closed this as completed in 6255dd7 Oct 9, 2020

marzer added a commit that referenced this issue Oct 10, 2020

fixed parser not handling overlong float literals correctly

30b756f

also a few other edge cases (see #65)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault when parsing with `TOML_EXCEPTIONS=0` #65

Segfault when parsing with `TOML_EXCEPTIONS=0` #65

sneves commented Oct 9, 2020

marzer commented Oct 9, 2020 •

edited

Loading

marzer commented Oct 9, 2020 •

edited

Loading

sneves commented Oct 9, 2020 •

edited

Loading

marzer commented Oct 9, 2020

marzer commented Oct 9, 2020

marzer commented Oct 9, 2020

marzer commented Oct 9, 2020

marzer commented Oct 9, 2020

sneves commented Oct 9, 2020

marzer commented Oct 9, 2020

sneves commented Oct 9, 2020 •

edited

Loading

marzer commented Oct 9, 2020

sneves commented Oct 10, 2020 •

edited

Loading

marzer commented Oct 10, 2020

Segfault when parsing with TOML_EXCEPTIONS=0 #65

Segfault when parsing with TOML_EXCEPTIONS=0 #65

Comments

sneves commented Oct 9, 2020

marzer commented Oct 9, 2020 • edited Loading

marzer commented Oct 9, 2020 • edited Loading

sneves commented Oct 9, 2020 • edited Loading

marzer commented Oct 9, 2020

marzer commented Oct 9, 2020

marzer commented Oct 9, 2020

marzer commented Oct 9, 2020

marzer commented Oct 9, 2020

sneves commented Oct 9, 2020

marzer commented Oct 9, 2020

sneves commented Oct 9, 2020 • edited Loading

marzer commented Oct 9, 2020

sneves commented Oct 10, 2020 • edited Loading

marzer commented Oct 10, 2020

Segfault when parsing with `TOML_EXCEPTIONS=0` #65

Segfault when parsing with `TOML_EXCEPTIONS=0` #65

marzer commented Oct 9, 2020 •

edited

Loading

marzer commented Oct 9, 2020 •

edited

Loading

sneves commented Oct 9, 2020 •

edited

Loading

sneves commented Oct 9, 2020 •

edited

Loading

sneves commented Oct 10, 2020 •

edited

Loading