Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base64 compatibility #49

Open
mattrobenolt opened this issue Aug 19, 2022 · 6 comments
Open

base64 compatibility #49

mattrobenolt opened this issue Aug 19, 2022 · 6 comments

Comments

@mattrobenolt
Copy link
Member

mattrobenolt commented Aug 19, 2022

We currently rely on btoa and atob, because that works in browsers and worked in versions of Node we tested against, but that's not as universal as we had hoped.

We should instead, provide some compatibility shim against the Buffer API that should more likely exist? We might need to provide a fallback for even that if neither Buffer nor btoa and friends exist.

But for now, it seems either way, it'd be more favorable to use the Buffer API when available over btoa.

I propose something like this, granted I dunno the best way to do this in JavaScript/TypeScript these days:

var b64encode;
var b64decode;
if(typeof Buffer !== "undefined") {
  b64encode = x => Buffer.from(x, 'latin1').toString('base64')
  b64decode = x => Buffer.from(x, 'base64').toString('latin1')
} else if (typeof btoa !== "undefined") {
  b64encode = btoa
  b64decode = atob
} else {
  // I dunno, natively has neither
}

console.log(b64encode('foo'))
console.log(b64decode('Zm9v'))

I think this is still reasonably applicable since we don't simply use it for the Authorization header. While that's the only use of btoa, atob is used to decode binary row data, which we do for every row in a QueryResult.

Refs: #47

@vinerz
Copy link

vinerz commented Jun 14, 2024

Chiming in: beyond being outdated and not recommended for new projects, atob and btoa severely compromises the decoding of utf-8 data.

By using these methods, the package is naively considering the values as being encoded in ASCII formatting, thus destroying characters outside the aforementioned encoding. The following TEXT value would be butchered: “Ô meu amigo, nós precisamos reunir o pessoal para o aniversário de José”

I have confirmed that the decoding issue happens locally by forcing btoa to comply to the new standards, which returns the data correctly decoded. When that is fixed, another issue arises: The decoder does not know the string length anymore as it was previously miscalculated and the columns values leak throughout the entire row, rendering the result useless.

@mattrobenolt
Copy link
Member Author

mattrobenolt commented Jun 14, 2024 via email

@vinerz
Copy link

vinerz commented Jun 14, 2024

Hey @mattrobenolt! You are right to assume the data received is indeed a correctly encoded binary UTF-8 string. The thing with btoa is that it is literally means binary to ascii conversion and vice-versa, so any characters outside this range will be improperly decoded.

Here's a simple example using my test above:

{
  "headers": [
    ":vtg1 /* VARCHAR */"
  ],
  "types": {
    ":vtg1 /* VARCHAR */": "VARCHAR"
  },
  "fields": [
    {
      "name": ":vtg1 /* VARCHAR */",
      "type": "VARCHAR",
      "charset": 255
    }
  ],
  "rows": [
    {
      ":vtg1 /* VARCHAR */": "� meu amigo, nós precisamos reunir o pessoal para o aniversário de José"
    }
  ],
  "rowsAffected": 0,
  "insertId": "0",
  "size": 1,
  "statement": "SELECT \"Ô meu amigo, nós precisamos reunir o pessoal para o aniversário de José\"",
  "time": 1.217693
}

Now, if I monkeypatch atob like this:

globalThis.atob = (str: string) => Buffer.from(str, 'base64').toString('utf-8')

The response is correctly parsed:

{
  "headers": [
    ":vtg1 /* VARCHAR */"
  ],
  "types": {
    ":vtg1 /* VARCHAR */": "VARCHAR"
  },
  "fields": [
    {
      "name": ":vtg1 /* VARCHAR */",
      "type": "VARCHAR",
      "charset": 255
    }
  ],
  "rows": [
    {
      ":vtg1 /* VARCHAR */": "Ô meu amigo, nós precisamos reunir o pessoal para o aniversário de José"
    }
  ],
  "rowsAffected": 0,
  "insertId": "0",
  "size": 1,
  "statement": "SELECT \"Ô meu amigo, nós precisamos reunir o pessoal para o aniversário de José\"",
  "time": 2.099751
}

@mattrobenolt
Copy link
Member Author

I'll spend a little time with this later but I think this is something different. I'll explain if I'm correct.

@vinerz
Copy link

vinerz commented Jun 14, 2024

@mattrobenolt it is indeed something different. There was a misunderstanding in my part about the cast property in the connection constructor. I've set it as a function to transform DATETIME fields to real javascript Date instances.

In my thought process, I though cast was a post-processor of the results output, not a fully fledged parser function itself. I see that the package exports the default parser, so I will use that as default case. Thank you for your time! =)

@mattrobenolt
Copy link
Member Author

Ah, yeah, that checks out. It's definitely expected for the raw data to be binary encoded data, so is entirely safe to do atob. But yeah, inside cast is where we decode the binary data as utf8 if applicable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants