-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spec compatibility #1
Comments
Thanks for the notes! My inclination is to come to a consensus on these things, then take some to the starlark spec channel, some as issues in this repo, and some as fixes to the readme.
There is no recursion check, so recursion is always enabled. I think that's fine as per the spec (we just hid a specific control option).
Floats and byte strings are definitely on the todo list.
In Rust, a String must be a UTF8 encoded string - if it's not, interoperability with Rust isn't easy. We want to be as compatible with Starlark as we can, but I think Rust Starlark needs to stick to valid-only UTF8 strings, or we're trading away a huge amount of convenience and performance. We're currently a long way away from the spec, and performance sucks, so we want to do more here - but that pretty much has to be the ultimate underlying representation in Rust, regardless of the Starlark standard.
Yep, it's a bug, but one that has performance implications. I know how to fix it, and plan to do so, but need to benchmark some specific tasks to ensure we don't regress at the same time.
I don't get why the spec disallows comparison on None. It makes no sense to me. Could you explain?
Yep, we plan to have a bigint union at some point. It's not pressing for us (Protobuf compatibility isn't that important to us), but it's definitely on the todo list. We also want encoded short strings, and that is a priority, so the plan was strings first, then reuse the techniques to do bigint.
Yep, I'll raise them. As a trivial example, As for promises of compatibility - don't worry - we're miles off that yet! This is a dynamically evolving project, and we're very happy to find a suitable middle ground on almost all of them. |
FWIW, we eliminated one incompatibility today (leading zeros meaning octal, 7be8744), and I'm working on the call argument order today. |
Fair enough. It's essentially an axiom of the design of Starlark that strings should interoperate with those of the host language---hence the UTF-k parameterization compromise between Go and Java. It's interesting that string data types vary so much between languages. I count five different approaches, those of {C,C++,Go}, Java, Python2, {Python3,Haskell}, and Rust. Both Pythons and Java made serious errors in the design of their strings. I have not written substantial Rust, but at least it takes advantage of all the good properties of UTF-8.
Are you implying that the proper semantics for list comprehensions in Starlark are not those prescribed by the spec of Starlark, but that of Haskell?
Python2 allowed ordered comparison of arbitrary pairs of values, even of different data types, essentially by defining a total (and totally arbitrary) order over data types. Although it had its uses, too often this served to obscure unintended type errors. Python3 removed this feature, and made ordered comparison much more restricted: only types for which ordering makes sense support ordered comparisons. The disallowed operations include
What do you mean by that?
I never liked that invention of Starlark, but it is prescribed by the spec: https://github.com/bazelbuild/starlark/blob/master/spec.md#dictionary-expressions
Good catch. Do file a bug for this any similar omissions.
Glad to hear it. Best of luck, and may the two existing implementations help the third. |
The fix for call argument ordering landed in 71dec72, so that's one compatibility down.
Great, sounds like we're on the same page.
I suggest I file a ticket at the Starlark spec, outlining what I've done, why I think it's better, and where it differs from Python. It's a somewhat minor detail.
Being able to compare values of different types is indeed dodgy. Being able to compare dictionaries has its own problems. But there's absolutely no reason that None doesn't have a well formed comparison, in the same sense that the empty tuple has a well formed comparison. This seems like Python making a mistake.
For short int (32bit) we pack the int into the pointer, and use a tag bit, so ints don't end up on the heap, and are super fast. We want to do the same with <= 7bit strings, so those short strings also end up packed into the tag. We then want to have short strings packed in the pointer, and long strings on the heap. Once we have that in place, doing the same with short ints in the pointer and long ints on the heap is nice and easy.
Should we follow Python here? Is this something reasonable to adjust in the spec? We do have a linter that checks for statically repeated keys in a dictionary, on the basis that's probably not intentional. I'll go through and file bugs for everything at some point in the near future. |
What's the benefit of moving from an error to a linter warning here? When I added this restriction, I've found many errors in the existing code at Google. Many code examples I saw were super confusing for a reader. I've fixed dozens of instances, none of them were intentional. Note that it's consistent with the check that rejects |
I think you've convinced me. Before I go as far as code, I guess there are other questions. Is |
Sorry for the confusion earlier, I have just checked our implementation: the error is dynamic only. Your examples show the difficulty of catching the problem statically (it would work only for trivial expressions). |
Cool - I've just submitted a patch to make repeated dictionary keys in dictionary literals a runtime error. I haven't made anything a compile-time error, but there is a lint-time error, so hopefully it's not too bad. The patch should show up in this repo in the next day or two. |
Well, comparison on tuples is useful. For consistency, you want to have it on all tuples, even if they're empty. Is there any code in your codebase that relies on None comparison? Could you review the code and check if it's legit? |
I'm unaware of anyone who does comparison on None, and I doubt it's useful in practice. But looking at other languages, Haskell has comparison on |
8fc3663 now fixes the |
FYI: bazelbuild/starlark@7f53743 adds support for byte strings to the spec, and begins to clarify the platform-dependent nature of text strings: UTF-8, valid by convention, in Go; UTF-16, valid by convention, in Java; and UTF-8, valid always, in Rust (though more needs to be said here). It defines the literals and the data type, but another PR is required to specify all the operators and conversions. |
We now disallow comparison between None and have changed the lexer to match the escaping rules. I think that leaves us with:
The first 4 are unimplemented features, so I'll mark them as separate bugs in this repo. The 5th is just a flat out bug, so it can have its own bug ticket, but it's more about figuring out quite what/why it does what it does, so there's a lot more work there. For the 6th I'll propose it as a Starlark implementation adjustment, and we can discuss it in detail there with full context. Anything else missing? |
I created a tag at https://github.com/facebookexperimental/starlark-rust/labels/spec and fresh tickets as #3, #4, #5, #6 to track the first of those 4 spec violations separately. For crazy strings, I think I'll leave that silent for now - it's more a warning than a specific issue to be addressed. I'll go through and do it at some point. For 6, I've pushed in a patch which should flow out to this repo in the next day or so. I've also modified the README to list those things with links to their tickets, which again, should flow to the repo in the next day or so. |
All these have now landed. The readme lists the spec compatibility issues. The list comps follow Starlark spec. I think we're all done here. Please reopen or comment if there's anything you think I've missed. |
Hi, Starlark spec author here. I'm very happy to see the Rust implementation improving, and that it now has a garbage collector.
In the
Compatibility
section of your main README, you list a number of places where the language differs from the spec. It would be unfortunate if the Rust implementation did not eventually converge on the spec, given that it was developed after the spec, and that two existing and largely conformant implementations are available for study.Let's go through the list:
Lambda (along with nested def) is now a non-optional feature, so you need no longer mention it here.
The Go and Java implementations permit the dynamic check on recursive function calls to be disabled. Does the Rust implementation also allow it to be controlled? If so, you need not mention it here.
We don't have the common extensions of floats, bit operations, byte strings or sets.
Floats and bit operations are not extensions, they are core features. Fortunately they are easy to add. Consider this a request to do so.
Byte strings will be part of the spec fairly soon; they are somewhat tedious to add because they duplicate a lot of the logic of strings, but they don't pose any fundamental difficulty. The Rust implementation in particular sounds like it needs them because its string type is text, not binary. (In Go, text and binary are both represented using byte strings. The story in the Java implementation is more complicated because of entrenched bugs in Bazel.)
Sets are currently a Go-only optional extension.
Our strings are not compliant in several ways, often returning code points instead of singleton strings, and have poor performance.
I plan to change the spec for strings (which is currently true only for the Go impl, from which the spec was derived) so that that strings are defined as sequences of numeric UTF-K codes where K is implementation defined: 8 (byte) for Go, 16 (char) for Java, with no guarantee that the encoding is valid. Conversions to byte strings (encoding) and back (decoding) would be explicit operations.
What value of K would Rust use? I can imagine 8 might be the natural choice, because Rust strings are UTF-8, but they also require valid encodings, which would make the substring operation fail to cut a code point in half, contravening the spec. An alternative would be UTF-32 (in other words, a sequence of int32 code points), but that would require either an inefficient 4-byte representation, or indexing to take linear time. Another possibility is to represent strings as arbitrary byte arrays and to make the Starlark-Rust API do a validity check and conversion when accessing a Starlark string, either failing, or mending the encoding at that point.
Function arguments are sometimes evaluated in the wrong order, which is visible if their arguments have side effects.
This seems like a simple bug to fix. The Go and Java implementations can be studied as a model, and they contain additional comments that rationalize the spec on this point.
We follow proper lexical binding for list comprehensions, so each for clause introduces a fresh variable (deliberately).
I'm not sure what you mean here. How is this different from what the spec requires and the Go and Java implementations do, which is that the comprehension defines a new block, and a for clause is a binding use of its variables? The hacks in the Java implementation were removed a while back.
We allow comparison between None values (deliberately).
Do you mean ordered comparison? Why deviate from the spec?
In some cases creating circular data structures may lead to stack overflows.
Yeah, this is a problem in all the implementations, and in every imperative language.
We use 32bit fixed size integers. Constructing larger values will result in Starlark failing with an overflow error.
int32 is not enough to represent all the data types that appear in a protocol buffer, for which one needs at least uint64 and int64. Without an int65 data type, you could use int128, but it's easier to use bigint. It really is a nicer language for it, and it needn't be expensive.
I recently changed the Java implementation to use bigint, without loss of efficiency: internally int is a union of three classes similar to java.lang.{Integer,Long,BigInteger}. It was an API change, but not a very disruptive one. Also, I changed the Go implementations to use an mmap trick so that int32 values can be represented as pointers to a special segment, allowing a single pointer to represent a small int or a big int without ambiguity, and without allocation in the int32 case. You might be able to follow the same approach in Rust.
There are a number of minor incompatibilities or places where the spec is unclear, many of which are included in our tests.
Please regard each of these as a bug, either in your implementation, or in our spec, in which case report them to us and we will endeavor to fix them. Starlark users would best served by not having to remember which of three similar dialects they are using in each application, and it's easier to achieve convergence before you have made promises of stability.
cheers
alan
The text was updated successfully, but these errors were encountered: