Skip to content
This repository has been archived by the owner on May 4, 2024. It is now read-only.

[byecode utils] make struct layout generation work with serdegen #652

Merged
merged 1 commit into from
Nov 17, 2022

Conversation

sblackshear
Copy link
Member

@sblackshear sblackshear commented Nov 7, 2022

One of the promises of generating layout YAML's for Move structs is that we can leverage serdegen to autogenerate wrapper types for Move structs in all the languages serdegen handles. This PR moves us closer to that.

  • Add separator option that allows replacing the Move source syntax separators ("::", "<", ">", ",") in type names with a user
    -defined string. This is helpful because a name like 0x1::M::T<u64> is very unlikely to be a valid identifier in any of the langua
    ges serdegen handles. Setting (e.g.) separator = __ allows a fully qualified Move type name to be a valid identifier.
  • Add an omit_addresses option that omits the addresses from a fully qualified Move type name. This is useful in the (common)
    case where you are generating bindings for a package that does not have any name conflicts and don't want to bother with the addresses. The generator looks for name conflicts and panics if it finds one.
  • Add an ignore_phantom_types option that does not include phantom type parameters in struct names, since they do not influence layout.

After these changes, the following PoC for serdegen works:
move sandbox generate struct-layouts --module storage/0x00000000000000000000000000000001/modules/M1.mv --struct G --omit-addresses --separator "__" > test.yaml serdegen --language python3 test.yaml > test.py python test.py

, where test.py is

from dataclasses import dataclass
import typing
import serde_types as st
    
@dataclass(frozen=True)
class M1__G:
    x: st.uint64
    s: "M1__S__bool__"
    
@dataclass(frozen=True)
class M1__S__bool__:
    t: bool

@sblackshear sblackshear requested review from amnn and tnowacki November 7, 2022 16:09
@sblackshear sblackshear force-pushed the fix_layout_gen branch 4 times, most recently from eab5485 to bc91059 Compare November 8, 2022 21:56
@tnowacki
Copy link
Member

tnowacki commented Nov 9, 2022

For separator, if it is not a valid Move Identifier character, e.g. %, is it possible for there to be collisions? I feel like the answer is no, but I have trouble convincing myself it is true. (I think it is only true because all separators get replaced this way).

Do we care about the case where there are collisions if you use something like ___ (though I think it would require a bad actor even in that case)?

Or maybe you catch all of these collisions in the same code path that does it for omit_addresses? (still reading, so maybe this will become obvious later)

let declaring_module = self
.module_resolver
.get_module_by_id(module_id)?
.expect("Failed to resolve module");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the function returns a Result. Should these panics and asserts be errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is that the assert failures happen when you get bad input (incomplete module environment, wrong # of type params, etc.) and the Result failures are intermittent issues due to (e.g.) disk I/O in the module environment.

let generics: Vec<String> = type_arguments
.iter()
.zip(normalized_struct.type_parameters.iter())
.filter_map(|(type_arg, type_param)| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, I like filter_map, but I feel like this would be more readable with distinct filter and map calls, since neither the filter function nor the mapping function inherently return an Option

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the conditional logic harder to write/understand by using the filter/map, so left it as is.

let generics: Vec<String> = type_arguments.iter().map(print_format_type).collect();
format!("{}::{}<{}>", module_id, name, generics.join(","))
format!(
"{}{}{}{}{}{}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this whole pattern section would be easier to read with write!?

e.g.

let mut struct_key = String::new();
if !self.config.omit_addresses {
  write!(f, "{}{}", module_id.address(), self.config.separator.as_deref().unwrap_or("::"));
}
write!(f, "{}{}{}", module_id.name(), self.config.separator.as_deref().unwrap_or("::"), name);
if !generics.is_empty() {
   ... 
}

This way you would get a lot more code sharing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, that is much more concise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think string can be used with write unfortunately. I think += might work though?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works with string.

Comment on lines +248 to +253
// check for conflicts (e.g., 0x1::M::T and 0x2::M::T that both get stripped to M::T because
// omit_addresses is on)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think this is what I was talking about in my top level comment. Should we report conflicts regardless?

@@ -5,16 +5,6 @@ Command `sandbox generate struct-layouts --module storage/0x00000000000000000000
STRUCT:
- t: U64
- b: BOOL
AccountAddress:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the test cases changes? I don't think I understand the code enough to understand the change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch--this was a mistake (used an empty map instead of default_registry() in the new constructor. Now fixed.

One of the promises of generating layout YAML's for Move structs is that we can leverage `serdegen` to autogenerate wrapper types for Move structs in all the languages `serdegen` handles. This PR moves us closer to that.

- Add `separator` option that allows replacing the Move source syntax separators ("::", "<", ">", ",") in type names with a user-defined string. This is helpful because a name like `0x1::M::T<u64>` is very unlikely to be a valid identifier in any of the languages `serdegen` handles. Setting (e.g.) `separator = __` allows a fully qualified Move type name to be a valid identifier.
- Add an `omit_addresses` option that omits the addresses from a fully qualified Move type name. This is useful in the (common) case where you are generating bindings for a package that does not have any name conflicts and don't want to bother with the addresses. The generator looks for name conflicts and panics if it finds one.
-Add an `ignore_phantom_types` option that does not include phantom type parameters in struct names, since they do not influence layout.

After these changes, the following PoC for serdegen works:

```
move sandbox generate struct-layouts --module storage/0x00000000000000000000000000000001/modules/M1.mv --struct G --omit-addresses --separator "__" > test.yaml
serdegen --language python3 test.yaml > test.py
python test.py
```

, where `test.py` is

```
from dataclasses import dataclass
import typing
import serde_types as st

@DataClass(frozen=True)
class M1__G:
    x: st.uint64
    s: "M1__S__bool__"

@DataClass(frozen=True)
class M1__S__bool__:
    t: bool
```
@sblackshear
Copy link
Member Author

For separator, if it is not a valid Move Identifier character, e.g. %, is it possible for there to be collisions? I feel like the answer is no, but I have trouble convincing myself it is true. (I think it is only true because all separators get replaced this way).

Do we care about the case where there are collisions if you use something like ___ (though I think it would require a bad actor even in that case)?

Or maybe you catch all of these collisions in the same code path that does it for omit_addresses? (still reading, so maybe this will become obvious later)

Good call--added collision checks whenever separator is not None

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants