Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dotenv octal escape unexpected behavior #650

Open
panzi opened this issue Jun 30, 2024 · 2 comments
Open

dotenv octal escape unexpected behavior #650

panzi opened this issue Jun 30, 2024 · 2 comments

Comments

@panzi
Copy link

panzi commented Jun 30, 2024

The regular expression for octal escape sequences matches too much (0\d{0,3} instead of 0[0-7]{0,3}, although that is still too much), replaces a \0 prefix with just \, and then if the unquoting of the escape sequence fails it inserts the manipulated match.

Meaning this value: "\079"
Gives this string: "\\79"
While it should give: "\x079" (bytes: [ 0x07, 0x39 ])

I.e. this are two bugs. Using the manipulated match when unquoting fails and matching too much and thus failing valid octal escape sequences.

Edit: Also through trail and error I found out that strconv.UnquoteChar() wants octal escape sequences to be exactly 3 octal numbers long, meaning the regular expression should actually be 0[0-7]{3}, or the match needs to be 0-padded to 3 characters long.

Further the regular expression also matches \c, which I can't find in the Go spec.

escapeSeqRegex = regexp.MustCompile(`(\\(?:[abcfnrtv$"\\]|0\d{0,3}))`)

match = strings.Replace(match, `\0`, `\`, 1)

@ndeloof
Copy link
Collaborator

ndeloof commented Jul 8, 2024

I'm not confident with regex and the way those are used in this context, so hardly can tell what would be the adequate expression
Would you like to help us fix this by writing a pull-request with a test case covering those various cases?

@panzi
Copy link
Author

panzi commented Jul 8, 2024

I've never used Go before looking at this. I just was bored and looked at the differences of various dotenv implementations and tried to replicate them in Rust, just for fun. I noticed these odd behaviors when my solution didn't produce the same results in some cases (I now emulate the behavior 1:1). I wrote down everything I noticed here: https://github.com/panzi/punktum#composego-dialect I also mention bugs in the variable substitution there, but didn't describe it well enough and now I can't remember the details. I think there's a problem in multi-line strings when the default value part spans multiple lines.

Also I just noticed yet another bug. If the inheritance syntax is used in the last line of a .env file and the file isn't newline terminated, then the parser will think the parsed key is the empty string and the value is the variable name. I.e. this .env file (given that there is no newline at the end):

FOO

Is equivalent to this JSON:

{ "": "FOO" }

Anyway, I wouldn't use regular expressions at all in implementing this, and neither would I use a library function for the escape sequences. Also I wouldn't do anything like escape sequence handling or variable substitution in separate passes. I find it simpler to think of when it is done in one pass. Because then the output of one pass doesn't need to prepare the string so that is correctly handled in the next (see $ handling). And about variable substitution: Yet another thing I wouldn't do with regular expressions, but in fact with a proper recursive parser, since it is a recursive syntax.

So because I never used Go before, am not a fan of what I've seen of Go, and am very opinionated on how such a parser should be written, I'm probably not the right person to contribute code here. XD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants