Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use uint8array #308

Merged
merged 29 commits into from
Dec 13, 2022
Merged

Use uint8array #308

merged 29 commits into from
Dec 13, 2022

Conversation

ailisp
Copy link
Member

@ailisp ailisp commented Nov 22, 2022

This PR will fix the issues mentioned in #117 and #195.

  • Implement Add UTF-8 encoding / decoding and native conversion between Uint8Array and string #314
  • Breaking changes: Redesign all relevant APIs to take Uint8Array instead of Bytes. Drop Bytes.
  • Redesign all collections, take string as keys. string can now be any UTF-8 string not only latin1 strings. And all collection APIs remain backward-compatible
  • Low level API tests, bytes tests and collection tests work
  • Fix all tests
  • Add tests to cover all cases conversion/encoding between string and Uint8Array
- any (utf8 seq, non utf8 seq) uint8array to string latin1
- any latin1 string (utf8 seq, non utf8 seq) to uint8array
- any (utf8 seq, non utf8 seq) uint8array to string utf8
- any utf8 string (utf8 chars, latin1 chars that happens to be utf8 seq, latin1 chars) to uint8array
- collections ( JSON.stringify state ), JSON.stringify utf8 input
  • Fix all examples

@ailisp
Copy link
Member Author

ailisp commented Dec 8, 2022

run ci tasks locally pass, mark it ready for review. @aesilevich to review changes in C code

@ailisp ailisp marked this pull request as ready for review December 8, 2022 10:03
@ailisp ailisp requested a review from volovyks as a code owner December 8, 2022 10:03
Copy link
Collaborator

@volovyks volovyks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @ailisp ! Huge effort.
My main suggestion is to move all the conversions to the API level. I left a few comments about that. But we can discuss it separately, feel free to merge this.

storage_remove(key: Bytes, register: Register): bigint;
storage_read(key: Uint8Array, register: Register): bigint;
storage_has_key(key: Uint8Array): bigint;
storage_write(key: Uint8Array, value: Uint8Array, register: Register): bigint;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about it, but again, what defines if we have string or Uint8Array here? And is it possible to do the conversion in api.ts so users can use strings? I guess they expect "key" to be a string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these functions also doing the high level task: decode / encode string into Uint8Array, then user do not have an API to deal with raw bytes. At current form they can compose high level functionality: write string to state by combining storage_write + bytes/encode

what defines if we have string or Uint8Array here

For apis like current_account_id(), we are sure it returned is a string. For apis like storage_read, it takes and returns array with any bytes, which is more fit Uint8Array.

And is it possible to do the conversion in api.ts so users can use strings?

Yes. Discussed further in your next comment

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually user has it, because they can use env.function() without API wrapper function. But it's less convinient, I agree.

Maybe we can separate this into 2 issues. Function params and return types. Function params can accept string | Uint8Array and return types can be Uint8Array if really necessary. functionRaw and 'function' is also an option.

It looks like a purely DevEx problem, let's get input from DevRel team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the function and functionRaw distinction makes sense. At the same time, haven't seen devs using these functions or asking about them often, so implementing two functions without them being popular feels contradictory. In any case, having a function that accepts a single param type is closer to single resp principle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supporting points:

... it's less convinient, I agree.
functionRaw and 'function' is also an option.
function and functionRaw distinction makes sense
having a function that accepts a single param type is closer to single resp principle.

about alternative:

Actually user has it, because they can use env.function() without API wrapper function

Right now env is not exported, and we need more discussion on whether to export it (it doubles host function APIs)

Function params can accept string | Uint8Array and return types can be Uint8Array if really necessary

I thinked about it, will still be breaking changes for function(string | Uint8Array) -> Uint8Array because return type was string, and if return string | Uint8Array based on param type, it's kind of magic.

At the same time, haven't seen devs using these functions or asking about them often,

Yeah it is not often, I've seen one ask: #195 (comment)

Combining your thoughts, I'll go ahead with function + functionRaw (I assume Serhii is okay with both functionRaw and functionBytes and @idea404 prefer functionRaw, and naming of less popular low level API can be changed or alias later)

const storageKey = this.keyPrefix + key;
return near.storageHasKey(storageKey);
return near.storageHasKey(encode(storageKey));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about places like this. Can we move this conversion to the API layer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of this: have two APIs, storageHasKey and storageHasKeyRaw. where storageHasKey(k) = storageHasKeyRaw(encode(k)). The benefit is backward compatible and in most case the string-version it's what user want. And they can still achieve raw bytes manipulation with raw-version.

The disadvantage is this API might hide the fact that state is in the form of binary. When attempt to use the Low level API, it is expected that they already know nomicon. And to me it's unexpected that near.storageHasKey is not env.storage_has_key but near.storageHasKeyRaw is. This makes api naming inconsistent. For example, input still returns Uint8Array, should it be inputRaw and input?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... maybe not raw, maybe inputBytes() ?
(more comments above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputBytes sounds good to me

@@ -120,7 +120,7 @@ export class HighlevelPromiseContract {
return {
...callingData(),
promiseResults: arrayN(near.promiseResultsCount()).map((i) =>
near.promiseResult(i)
str(near.promiseResult(i))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any use cases we want to represent promiseResult as something else but string?
(this one is about moving conversion to the API level again)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example would be promise return one's public key. And callback function take that public key to do something. Public keys that host function returns and takes are binary format.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if they both have string in their API, it will work? But we will do the conversion 2 times (gas issue)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. conversion 2 times adds unnecessary gas consumption

@ailisp
Copy link
Member Author

ailisp commented Dec 13, 2022

Based on above discussion, I make storage / valueReturn / input / promise API with two versions. One is string version. It works with any string and utf-8 decode the string to fit NEAR's binary state / arguments interface. The other one is raw version that works directly on Uint8Array and developer can handle customized encoding.

Note that even though string version has the same name, same interface as before, it is not fully backward compatible with original APIs. Take storageWrite as example, the comparison is:

  • If string is fully ascii, the behavior is the same.
  • If string contains charCode > 255 unicode char, original API doesn't work, current string API works as expected
  • If string contains 127 < charCode <= 255, original API stores exactly one byte per char, current string API treat charCode as a subset of unicode char and encode it in utf-8, resulting use two bytes to store one char. It is not as storage efficient as the raw version (storageWriteRaw) or the original version, but result is guaranteed to be correct when read back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants