Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

Restore: fix utf-8 encoding returning buffers #51

Closed
wants to merge 1 commit into from
Closed

Restore: fix utf-8 encoding returning buffers #51

wants to merge 1 commit into from

Conversation

vweevers
Copy link
Member

@vweevers vweevers commented Jun 21, 2019

History:

  • The utf8 encoding is supposed to return a string. That's always been the case AFAIK and makes perfect sense. Stores like leveldown however, may return data as a buffer and we've had to account for this. Let's call this the maybe-string problem. The primary solution has been to pass leveldown a boolean *asBuffer option. If false, leveldown returns data as a string.
  • When we separated the encoding logic from levelup into encoding-down, we didn't take the *asBuffer options into account (we didn't realize that at the time). Some stores would return a buffer instead of a string. To deal with that, coercion to string was added to level-codec in Fix/utf8 decoding #12 (7.0.0).
  • Coercion to string was removed in Revert "fix utf-8 encoding returning buffers" #23 (8.0.0) because the *asBuffer logic was restored in asBuffer fix encoding-down#19; we thought coercion was no longer necessary.

This PR restores the coercion, to work around an ecosystem quirk: leveldown and memdown handle strings and buffers differently. While leveldown stores both types as a byte array (meaning you can put a buffer and get back a string if so desired, and vice versa), memdown stores them as-is (meaning if you put a buffer, you'll get back a buffer; if you put a string, you'll get back a string - simplified). This leads to unexpected behavior.

Another issue (which won't be fixed by this PR but is very relevant) is that memdown isn't able to compare a string key to a buffer key (or any other type for that matter); you can only safely use one key type in your db. Possible solutions are discussed in Level/memdown#186. Let's call this the mixed-type problem. It is relevant because:

  • One proposed solution will make memdown behave like leveldown and thus remove the need for this level-codec PR. Before you say "that sounds like the simplest solution", wait...
  • Another proposed solution will make memdown behave like level-js which doesn't have the maybe-string problem either, albeit for a different reason. It treats strings and buffers as distinct keys and values, even if their bytes are the same. Arguably - especially when viewed outside of the historical context of Level - this is the least-surprising behavior because you get back what you store. Working with binary data is a distinctly different use case from working with utf8 strings. You'll only sometimes have the need to process utf8 data as binary, which you can still do.

So, fixing the mixed-type problem might also fix the maybe-string problem, but we could still choose to merge this PR as a short- to medium-term solution.

@vweevers vweevers added the semver-major Changes that break backward compatibility label Jun 21, 2019
@vweevers vweevers self-assigned this Jun 21, 2019
@vweevers vweevers removed their assignment Jun 22, 2019
@vweevers
Copy link
Member Author

Updated the description with context.

@achingbrain
Copy link
Contributor

I just hit this too while trying to upgrade some really old modules to the latest versions of level*, it would be great if this could be merged. Robustness principle, etc.

@vweevers
Copy link
Member Author

@achingbrain With memdown, or other (would be good to know)?

This PR is not my preferred solution, it's a workaround. If it were semver-patch we could release it right away but it's semver-major (that bubbles up to dependents) so I don't want to rush this. Feedback on Level/memdown#186 is most welcome.

@achingbrain
Copy link
Contributor

Yes - with level-mem specifically.

@vweevers
Copy link
Member Author

Hm, let's reconsider the assertion that this would be semver-major, because I only based that on the fact that the previous back-and-forth changes were semver-major.

If we can all agree that the utf8 encoding is supposed to return a string, and that this has always been the case, then this PR can be considered a bug fix. If anyone somehow relied on the bug (which I doubt) or made a workaround for it, they should have done so with a typeof x !== 'string' check. Even if they didn't, they most likely did something like x = String(x) which will still work.

@vweevers
Copy link
Member Author

Closing in favor of Level/memdown#191.

@vweevers vweevers closed this Aug 14, 2019
@vweevers
Copy link
Member Author

vweevers commented Sep 6, 2019

@achingbrain I just released level-mem@5.0.0 which upgraded to memdown@5 which removes the need for this level-codec workaround. Let us know if it doesn't resolve your issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
semver-major Changes that break backward compatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants