Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3 panics in some cases when marshalling unicode char in comments #538

Closed
mjambert opened this issue Nov 5, 2019 · 10 comments
Closed

v3 panics in some cases when marshalling unicode char in comments #538

mjambert opened this issue Nov 5, 2019 · 10 comments

Comments

@mjambert
Copy link

mjambert commented Nov 5, 2019

With yaml.v3, try this in go playground => get panic.


import (
	"gopkg.in/yaml.v3"
)

func main() {
	var rootNode yaml.Node
	data := []byte(
		`## Some nasty character like this crashes yaml marshalling -> š
dummyStuff:`)
	_ = yaml.Unmarshal(data, &rootNode)

	yaml.Marshal(&rootNode)
}
@mikefarah
Copy link

Another example:

value: newValue #проверить

@niemeyer
Copy link
Contributor

niemeyer commented May 4, 2020

It wasn't the writing that was broken, but the reading.

This should be fixed with 5308cda. Please drop me a note if you see related issues.

@mikefarah
Copy link

Updated, unfortunately I still get the same error when reading

value: newValue #проверить
$ ./yq r examples/sample.yaml
panic: runtime error: index out of range [10] with length 10 [recovered]
        panic: runtime error: index out of range [10] with length 10

goroutine 1 [running]:
gopkg.in/yaml%2ev3.handleErr(0xc0001078e0)
        C:/Users/mikefarah/go/pkg/mod/gopkg.in/yaml.v3@v3.0.0-20200504163728-5308cda29e3d/yaml.go:276 +0xa1
panic(0x670160, 0xc000012e40)
        c:/go/src/runtime/panic.go:967 +0x174
gopkg.in/yaml%2ev3.write(0xc000128700, 0xc00000cb30, 0xa, 0x10, 0xc000106250, 0x53b401)
        C:/Users/mikefarah/go/pkg/mod/gopkg.in/yaml.v3@v3.0.0-20200504163728-5308cda29e3d/emitterc.go:93 +0x210
gopkg.in/yaml%2ev3.yaml_emitter_write_comment(0xc000128700, 0xc00000cb30, 0xa, 0x10, 0x1)

@niemeyer
Copy link
Contributor

niemeyer commented May 5, 2020

Works fine here and in play.golang.org:

https://play.golang.org/p/-Frq5ryF1Lb

@mikefarah
Copy link

Ah but it doesn't work with Decoder/Encoder:

https://play.golang.org/p/NYU0kEpK71M

@mikefarah
Copy link

@niemeyer

@lingsamuel
Copy link

lingsamuel commented May 28, 2020

It doesn't work with marshal/unmarshal if you specify a type:
https://play.golang.org/p/xFD8Q0W7GHb
Please reopen this issue while it's not fixed yet.

@lingsamuel
Copy link

Okay I got the point:
In scannerc.go function yaml_parser_scan_line_comment (line 2830) :

// ...
if parser.mark.index >= seen {
	if len(text) == 0 {
		start_mark = parser.mark
	}
	text = append(text, parser.buffer[parser.buffer_pos]) // ERROR!
}
skip(parser)
// ...

In function skip, it add buffer_pos according to character width:

	parser.buffer_pos += width(parser.buffer[parser.buffer_pos])

width:

// Determine the width of the character.
func width(b byte) int {
	// Don't replace these by a switch without first
	// confirming that it is being inlined.
	if b&0x80 == 0x00 {
		return 1
	}
	if b&0xE0 == 0xC0 {
		return 2
	}
	if b&0xF0 == 0xE0 {
		return 3
	}
	if b&0xF8 == 0xF0 {
		return 4
	}
	return 0
}

The solution:

width := width(parser.buffer[parser.buffer_pos])
text = append(text, parser.buffer[parser.buffer_pos:parser.buffer_pos+width]...)

@niemeyer
Copy link
Contributor

Thanks for investigating it. I've fixed a similar issue in 5308cda, and will use the same approach here.

@niemeyer
Copy link
Contributor

niemeyer commented Jun 1, 2020

Okay, that same logic was applied to inline comments on 913338d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants