Restore: fix utf-8 encoding returning buffers #51

vweevers · 2019-06-21T22:39:28Z

History:

The utf8 encoding is supposed to return a string. That's always been the case AFAIK and makes perfect sense. Stores like leveldown however, may return data as a buffer and we've had to account for this. Let's call this the maybe-string problem. The primary solution has been to pass leveldown a boolean *asBuffer option. If false, leveldown returns data as a string.
When we separated the encoding logic from levelup into encoding-down, we didn't take the *asBuffer options into account (we didn't realize that at the time). Some stores would return a buffer instead of a string. To deal with that, coercion to string was added to level-codec in Fix/utf8 decoding #12 (7.0.0).
Coercion to string was removed in Revert "fix utf-8 encoding returning buffers" #23 (8.0.0) because the *asBuffer logic was restored in asBuffer fix encoding-down#19; we thought coercion was no longer necessary.

This PR restores the coercion, to work around an ecosystem quirk: leveldown and memdown handle strings and buffers differently. While leveldown stores both types as a byte array (meaning you can put a buffer and get back a string if so desired, and vice versa), memdown stores them as-is (meaning if you put a buffer, you'll get back a buffer; if you put a string, you'll get back a string - simplified). This leads to unexpected behavior.

Another issue (which won't be fixed by this PR but is very relevant) is that memdown isn't able to compare a string key to a buffer key (or any other type for that matter); you can only safely use one key type in your db. Possible solutions are discussed in Level/memdown#186. Let's call this the mixed-type problem. It is relevant because:

One proposed solution will make memdown behave like leveldown and thus remove the need for this level-codec PR. Before you say "that sounds like the simplest solution", wait...
Another proposed solution will make memdown behave like level-js which doesn't have the maybe-string problem either, albeit for a different reason. It treats strings and buffers as distinct keys and values, even if their bytes are the same. Arguably - especially when viewed outside of the historical context of Level - this is the least-surprising behavior because you get back what you store. Working with binary data is a distinctly different use case from working with utf8 strings. You'll only sometimes have the need to process utf8 data as binary, which you can still do.

So, fixing the mixed-type problem might also fix the maybe-string problem, but we could still choose to merge this PR as a short- to medium-term solution.

vweevers · 2019-06-22T11:53:23Z

Updated the description with context.

achingbrain · 2019-06-25T08:00:22Z

I just hit this too while trying to upgrade some really old modules to the latest versions of level*, it would be great if this could be merged. Robustness principle, etc.

vweevers · 2019-06-25T08:30:57Z

@achingbrain With memdown, or other (would be good to know)?

This PR is not my preferred solution, it's a workaround. If it were semver-patch we could release it right away but it's semver-major (that bubbles up to dependents) so I don't want to rush this. Feedback on Level/memdown#186 is most welcome.

achingbrain · 2019-06-25T08:40:42Z

Yes - with level-mem specifically.

vweevers · 2019-06-29T09:30:56Z

Hm, let's reconsider the assertion that this would be semver-major, because I only based that on the fact that the previous back-and-forth changes were semver-major.

If we can all agree that the utf8 encoding is supposed to return a string, and that this has always been the case, then this PR can be considered a bug fix. If anyone somehow relied on the bug (which I doubt) or made a workaround for it, they should have done so with a typeof x !== 'string' check. Even if they didn't, they most likely did something like x = String(x) which will still work.

vweevers · 2019-08-14T08:51:12Z

Closing in favor of Level/memdown#191.

vweevers · 2019-09-06T20:17:44Z

@achingbrain I just released level-mem@5.0.0 which upgraded to memdown@5 which removes the need for this level-codec workaround. Let us know if it doesn't resolve your issues.

restore: fix utf-8 encoding returning buffers

9f6686a

vweevers added the semver-major Changes that break backward compatibility label Jun 21, 2019

vweevers self-assigned this Jun 21, 2019

vweevers mentioned this pull request Jun 21, 2019

bytewise key-encoding on sub-level causes not found on other level Level/subleveldown#64

Closed

vweevers removed their assignment Jun 22, 2019

achingbrain mentioned this pull request Jun 25, 2019

update all deps, fix standard linting Level/multileveldown#14

Closed

vweevers mentioned this pull request Aug 14, 2019

Drop support of key types other than string and Buffer Level/memdown#191

Merged

3 tasks

vweevers closed this Aug 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore: fix utf-8 encoding returning buffers #51

Restore: fix utf-8 encoding returning buffers #51

vweevers commented Jun 21, 2019 •

edited

Loading

vweevers commented Jun 22, 2019

achingbrain commented Jun 25, 2019

vweevers commented Jun 25, 2019

achingbrain commented Jun 25, 2019

vweevers commented Jun 29, 2019

vweevers commented Aug 14, 2019

vweevers commented Sep 6, 2019

Restore: fix utf-8 encoding returning buffers #51

Restore: fix utf-8 encoding returning buffers #51

Conversation

vweevers commented Jun 21, 2019 • edited Loading

vweevers commented Jun 22, 2019

achingbrain commented Jun 25, 2019

vweevers commented Jun 25, 2019

achingbrain commented Jun 25, 2019

vweevers commented Jun 29, 2019

vweevers commented Aug 14, 2019

vweevers commented Sep 6, 2019

vweevers commented Jun 21, 2019 •

edited

Loading