Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Rust Stable channel #7

Closed
Mythra opened this issue Mar 12, 2021 · 26 comments
Closed

Support Rust Stable channel #7

Mythra opened this issue Mar 12, 2021 · 26 comments

Comments

@Mythra
Copy link

Mythra commented Mar 12, 2021

Opening a tracking issue here, as a carry-over from: google/starlark-rust#283 😄 , as it seems there were a couple options discussed (RUSTC_BOOTSTRAP in this crate vs others, potentially maintaining a stable only interface, etc.) I'm just opening this to track that. As I'd like to switch over to this crate at some point rather than maintaining my own fork 😅

@ndmitchell
Copy link
Contributor

Very keen to have you switch over @Mythra! Keen to know if using RUSTC_BOOTSTRAP is feasible for you or not. Also happy to chat on video conference if you want, so we can talk more openly about what you're after, the kind of changes you've made and need supporting. While I doubt there's anything that we won't support, if there was, it would be good to know sooner rather than later!

@ndmitchell ndmitchell changed the title Stable Support Support Rust Stable channel Mar 12, 2021
@Mythra
Copy link
Author

Mythra commented Mar 12, 2021

I haven't really messed with RUSTC_BOOTSTRAP before so to be honest I don't know, but I'm definitely willing to give it a shot although it seems intimidating.

As for my code I admit the biggest thing I'm worried about is porting over the error experience. I've put quite a bit into my solution with the old crate, and although it's far from ideal in a technical sense I'd definitely like to keep a similar user experience. I also haven't looked into the new #[starlark_module] proc macro, but I assume it's mostly the same meaning:

  • I can have an arg that is positional or by name.
  • I can have an argument only by name
  • I can accept variable types on an argument.

I'll document the current error strategies I'm doing below just as sort of an overview, but if a call would helpful or ideas that aren't best discussed in text I'd be happy to hop on a call or do anything of that sort.


  • All starlark errors rendered by starlark itself (e.g. a lexer error, etc.) all I've been doing is taking the span location of the error, and putting a "nice polish" on it:
if labels.is_empty() {
	return unknown_state_starlark_error(
		"CE00",
		"Reserved keyword `break` used in a place it can't be used",
		diag,
	);
}
let loc = chop_off_newline_if_present(&labels.get(0).unwrap().0);

let mut err = eyre!(
	"{}: break used outside of a loop",
	PyraStarlarkErrors::BadLoopControl,
)
.with_section(|| {
	highlight_code(
		&loc,
		"the break keyword is reserved, and can ony be used in a loop",
	)
});

if bad_indent(&loc) {
	err = err.with_suggestion(|| {
		"The indentation is less than the line before, perhaps missing an indent?"
	});
}

err

This example is one of the most complex for this face lift because it actually looks at the string representation of the location, and looks at the lines before to see if maybe someone just didn't indent correctly. Then again most of this is just outputing the error information, and highlighting in a way consistent with the rest of the tool. Nothing too fancy.


For my own errors for my types I've been abusing the fields in starlark_error! to signify not only the error code, the exact type, field, and any extra data that's needed to parse out the error. Effectively my errors from starlark get returned like so:

starlark_err!(err_code, "<type_name>".to_owned(), format!("{}#{}", field_name_in_type, extra_data));

My error handler then takes these, and does the following:

  • Get a string representation of the starlark code that caused the error (which is just the function being called i've defined, and maybe some whitespace. Nothing else).
  • Parse the typename, and call a particular function for that type called: TypeName::zero_in_on_argument. Zero in on argument looks up the field name that caused the error which may look something like: options.1 (signifying the options array's first item, is the field that caused the error), and parses out a new range to highlight which will cover just that argument.
  • Create an error message for that particular error code , which highlights an exact field that threw an error, and parse any extra data that is needed.

This leads me to be able to highlight rather than the whole function, a particular field (even nested):

   0: pyra::starlark::eval::invalid_task_name: Task name conflicts with a built in command, or is empty

  --> /home/cynthia/projects/personal/pyra/.github/pyra.star:8:5
   |
 5 | oneof_task = pyra_oneof(
 6 |   "build",
 7 |   options=[
 8 |     "help"
   |     ^^^^^ This task name is invalid
 9 |   ]
10 | )
   |

Finally the last thing I haven't actually implemented yet, but was just about to look into was storing a display representation of a stack-trace. This is for cases where a piece of configuration may be valid on it's own, but once I've parsed all of the starlark configuration and done validation on it there may be conflicting configurations.

In this case of my tool I have some things called "tasks", that have names that need to be globally unique. So at construction time they may be totally valid, but as soon as I look at the thing in whole I realize there are two tasks with the same name. Which is an error in my case, and I want to be able to show the user "hey here is where the two resources are defined that conflict", as they may be in two very different files, or hidden away through helper functions, or something of the sort.

@ndmitchell
Copy link
Contributor

RUSTC_BOOTSTRAP has an intimidating name, to intimidate people. But it's really just set an environment variable and nightly features work on a stable compiler.

For error, I've rewritten it entirely. There are no more error codes. Everything is based on anyhow errors. Everything is defined as an enum using thiserror. We have a custom Diagnostic type (see https://github.com/facebookexperimental/starlark-rust/blob/master/starlark/src/errors/mod.rs#L37-L49) which is basically code location, anyhow error, call stack, and then we wrap it back up into an anyhow. I also rewrote the parser to allow things like break in more places, and then detect it after we have an AST, which gives much better error messages. It may be that our work there mostly overlaps, but I'm certain you'll have added great messages we missed, and adding new ones is definitely a good thing.

Because we moved to anyhow, you no longer need to abuse types to define your own errors. You can define them with full structure, and it's an open universe, so your error types are just as good as the internal ones.

For argument errors, you are doing better than we are. We have UnpackValue which basically says whether a type matches or doesn't match an argument. That lets us do all the validation on each argument, but only at the level of the argument not a custom sub-expression (bad), we don't allow a custom error message (bad) and positionally we report it at the level of the call (bad). But the plan was to change UnpackValue with a real Value parser, which would give us the first two. And the third is probably possible today - although there are challenges around mapping things like kwargs where multiple fields may combine or the nesting may be difficult to disentangle. If your techniques are generic enough to be broadly applicable, and not a performance drain in the non-error case, I'm sure we can figure something out.

You can capture the call stack pretty easily, and we can have Diagnostic which stores multiple call stacks. None of that sounds too hard, and addresses your globally unique thing. There is a perf cost to capturing call stacks. If it's an error path, no big deal, but if you want to capture it on every identifier there may be considerations. But, it's going to be as fast as it can be (we can capture lazy traces which don't resolve or allocate more than just a Vec).

@Mythra
Copy link
Author

Mythra commented Mar 12, 2021

For error, I've rewritten it entirely. There are no more error codes. Everything is based on anyhow errors. Everything is defined as an enum using thiserror. We have a custom Diagnostic type (see https://github.com/facebookexperimental/starlark-rust/blob/master/starlark/src/errors/mod.rs#L37-L49) which is basically code location, anyhow error, call stack, and then we wrap it back up into an anyhow. I also rewrote the parser to allow things like break in more places, and then detect it after we have an AST, which gives much better error messages. It may be that our work there mostly overlaps, but I'm certain you'll have added great messages we missed, and adding new ones is definitely a good thing.

This sounds great but one thing I've noticed is the errors seem to be public to the crate

pub(crate) enum Dubious {

Which makes formatting them to fit the same style as the rest of the errors quite difficult. Would you be open to making them public but with a #[non_exhaustive] to indicate errors/fields on errors can be added without calling a breaking change? I'm also happy to look through and start porting over my error messages to this crate if they're not already reporting similar information (or better).


For argument errors, you are doing better than we are. We have UnpackValue which basically says whether a type matches or doesn't match an argument. That lets us do all the validation on each argument, but only at the level of the argument not a custom sub-expression (bad), we don't allow a custom error message (bad) and positionally we report it at the level of the call (bad). But the plan was to change UnpackValue with a real Value parser, which would give us the first two. And the third is probably possible today - although there are challenges around mapping things like kwargs where multiple fields may combine or the nesting may be difficult to disentangle. If your techniques are generic enough to be broadly applicable, and not a performance drain in the non-error case, I'm sure we can figure something out.

I'd be happy to upstream my work if you'd accept it, but right now admittedly it isn't the prettiest. And although it adds no cost to the non-error case (or should add miniscule overhead - like the cost of one field indicating where it is), it does add significant cost to the error case. Turning a SpanLoc -> String, and then basically "pseudo-parsing" that string to find the field is not exactly cheap, and also not the cleanest code to look at. The actual extraction looks something like this: https://gist.github.com/Mythra/d318980dd62190ff17e6f5d04e146576. Note: This also fails in cases like **kwargs, instead reporting the whole callsite.


But, it's going to be as fast as it can be (we can capture lazy traces which don't resolve or allocate more than just a Vec).

Yeah lazy traces would definitely be helpful here. Admittedly I'm starting from the case of "best UX first", then planning on doing some work before I release the tool to make sure in the case of small, to "large" (10000's of items) it meets my speed requirements.

@ndmitchell
Copy link
Contributor

This sounds great but one thing I've noticed is the errors seem to be public to the crate

Yes, deliberately so. The only errors that aren't yet pub(crate) like ValueError are slated to become so in future. The reason is to keep encapsulation. When you start down the road of anyhow, you lose static checks on errors - which has been a massive advantage in terms of flexibility. Once you export the error types, people then have the option to build that on top dynamically checked - which often goes wrong.

Which makes formatting them to fit the same style as the rest of the errors quite difficult

What's your style? Is it better than the existing style? If so, why not make it the only style? Rather than provide choice and customisation, I'd rather just make the library as good as it can be, so everyone wins. Would be great to see a before/after comparison. If we need more structure, Diagnostic would be the place to put it, but I'm keen to make defining errors cheap, since its so common. But also good, since users often have to fight them.

I'd be happy to upstream my work if you'd accept it, but right now admittedly it isn't the prettiest.

As it stands, not really - the pseudo-parsing freaks me out a lot. But the idea and concept, yeah, I can get on board with that, and it would be great to have. Once you are on the error path, as long as its not outrageous, some cost is fine. I'd go for the approach of grab the string, reparse, reorientate with spans on the AST, then find the right subexpression using traversals. We need the better conversion of values to types (value parsing) to get workable substrings though.

@Mythra
Copy link
Author

Mythra commented Mar 12, 2021

What's your style? Is it better than the existing style? If so, why not make it the only style? Rather than provide choice and customisation, I'd rather just make the library as good as it can be, so everyone wins. Would be great to see a before/after comparison. If we need more structure, Diagnostic would be the place to put it, but I'm keen to make defining errors cheap, since its so common. But also good, since users often have to fight them.

That makes sense. My format isn't too different. The big thing is having a unique error code that a user can then lookup using another command to get a more indepth explaination (similar to rustc's explain). But the basic idea is:

${error_code}: ${error title}

  ${code section with highlighting}

As it stands, not really

I don't blame you 😅 . This was very much a "I need a solution", but knowingly not a good solution. I'd be happy to integrate it with the solution you mentioned but admittedly not sure where to start there, but something we can look into.

@ndmitchell
Copy link
Contributor

I'm unconvinced that a project the size of Rust Starlark, with the resources available to it, can support error codes with good informative explanations. If you look at projects like Python or Haskell they haven't managed to do it, and they have tried. Errors codes take a lot of time, love and maintenance. And in most cases a good error message is probably better, and for the ones that do require nuanced explanation, a StackOverflow question people can Google for.

For detailed errors on arguments, that's definitely something we can do. If you're interested in helping with it, we should probably video chat, as its easier to brainstorm the approaches that way.

@ndmitchell
Copy link
Contributor

Maybe my remarks on the error codes are a bit harsh. If someone writes a web page, listing all the errors, and describing them in useful detail, and commits to keeping it up to date, we could easily have some kind of code prefix added. My hunch is that's too much work, and the benefits are insufficient for the time it would take. But if I'm wrong, and someone writes the web page, we can add the codes. What I don't want to do is add the codes (which has maintenance and development cost) assuming the web page will come later, when my suspicion is that it won't.

@Mythra
Copy link
Author

Mythra commented Mar 14, 2021

I'd be happy to certainly maintain error codes, as well as well written error messages for those codes. I agree error codes can be tough in some places, and should never be used in place of a good solid error. That being said having a distinct code can be helpful to both parties, and is something I've already been doing for my whole codebase (including starlark), and would be happy to contribute back up, and maintain.

Also happy to set up a call some time in the coming weeks to discuss actual implementation of some of these ideas, and get working on them.

@ndmitchell
Copy link
Contributor

Great, sounds like a call is the next step on most of these. I'm ndmitchell gmail.com. Email me and let's find a time that works.

@ndmitchell
Copy link
Contributor

Issues #10, #11 and #12 were spun out from discussions on the call. We'll leave this ticket to deal with supporting the Rust stable channel, but agreed there is no need yet, so consider it an issue that may become important in future, but isn't required now.

@indygreg
Copy link
Contributor

I use starlark-rust 0.3 in PyOxidizer and the lack of stable channel support is meaningful to me because PyOxidizer is at a stability and popularity point where people like Linux distros want to package it downstream and often have policies about the use of stable Rust features. I know of and have abused RUSTC_BOOTSTRAP before. But having it just works on stable avoids a lot of headaches, especially for software that is installed by end users who don't necessarily know Rust.

While I'm here: I'm glad to see this crate being actively developed! I ❤️ Starlark and hope to see it grow in popularity [in the Rust ecosystem].

@ndmitchell
Copy link
Contributor

Thanks for your feedback @indygreg. I can see why stable makes things easier for you and distributions. I guess the first thing would be for you to assess whether stable is your only blocker in using the newer version - we changed a lot of the API surface area. While I think it's now more powerful, there might be something you were relying on that we removed, and that would be good to check (I'm sure that we can put it back if there is). It would also be good to know that you are happy with it, and happy to switch.

I think if you port PyOxidizer to it, we'll promise to switch to supporting stable within a week or so of that happening, so it doesn't delay you (or once you are far enough along that you're sure it's the way you want to go). Does that seem reasonable?

@indygreg
Copy link
Contributor

Your proposal sounds reasonable!

Let me have a go at porting to 0.4. I have quite a bit of Starlark code and am doing some... questionable things. The last time I updated the crate it took several hours of work. I anticipate the same here given the surface area of the API changes. I'll file new issues for any issues I encounter with 0.4. It may take a few days before I have any meaningful feedback, however.

@ndmitchell
Copy link
Contributor

Sounds great. If you have any questions about where things have moved to, feel free to raise a new issue and I can answer them there - and definitely for anything that seems to have gone entirely.

@indygreg
Copy link
Contributor

As I started to look at refactoring my code, I found myself wanting to shave some yaks related to the horrible state of the Starlark code in PyOxidizer. So it will likely be a while longer before I have meaningful feedback.

The progress I have made so far is promising though. And it looks like changes in 0.4 introduce proper support for variable access, which means I won't need to shoehorn state into TypeValues, which was very ugly. I'm very eager to port to 0.4. But it will have to wait until some other work occurs on my end. Sorry for the delay!

@ndmitchell
Copy link
Contributor

No rush on our side at all!

@nemith
Copy link

nemith commented Oct 31, 2022

Is this still a goal? It would be really nice to use starlark in a project but not require nightly rust builds and have it work out of the box with stable rust.

As nice as the bootstrap process it, I shouldn't have to modify my project to fit the libraries?

@stepancheg
Copy link
Contributor

stepancheg commented Oct 31, 2022

I think currently we rely heavily on certain rust internals (like compile time type id) which are very hard to avoid. So practically it is not possible to make it stable at the moment.

However, we should reduce the number of unnecessary unstable features we use to make it more clear what we really need and what are small convenient bits.

Because from the top of my head, I cannot tell you exactly where we rely on unstable.

@stepancheg
Copy link
Contributor

FYI, current version of starlark-rust works on stable. Need to release it to crates.

@Mythra Mythra closed this as completed Jan 14, 2023
@nemith
Copy link

nemith commented Jan 20, 2023

This does not seem to be the case due to dependencies (also from Facebook)

error[E0554]: `#![feature]` may not be used on the stable release channel
  --> /Users/brandonbennett/.cargo/registry/src/github.com-1ecc6299db9ec823/gazebo-0.6.0/src/lib.rs:10:49
   |
10 | #![cfg_attr(feature = "str_pattern_extensions", feature(pattern))]
   |                                                 ^^^^^^^^^^^^^^^^

error[E0554]: `#![feature]` may not be used on the stable release channel
  --> /Users/brandonbennett/.cargo/registry/src/github.com-1ecc6299db9ec823/gazebo-0.6.0/src/lib.rs:11:49
   |
11 | #![cfg_attr(feature = "str_pattern_extensions", feature(associated_type_bounds))]
   |                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0554]: `#![feature]` may not be used on the stable release channel
  --> /Users/brandonbennett/.cargo/registry/src/github.com-1ecc6299db9ec823/gazebo-0.6.0/src/lib.rs:10:57
   |
10 | #![cfg_attr(feature = "str_pattern_extensions", feature(pattern))]
   |                                                         ^^^^^^^

@nemith
Copy link

nemith commented Jan 20, 2023

It doesn't even seem to compile on nightly

$ cargo +nightly build
   Compiling starlark v0.8.0
error[E0407]: method `backtrace` is not a member of trait `Error`
   --> /Users/brandonbennett/.cargo/registry/src/github.com-1ecc6299db9ec823/starlark-0.8.0/src/errors/mod.rs:126:5
    |
126 | /     fn backtrace(&self) -> Option<&std::backtrace::Backtrace> {
127 | |         Some(self.message.backtrace())
128 | |     }
    | |_____^ not a member of trait `Error`

error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
  --> /Users/brandonbennett/.cargo/registry/src/github.com-1ecc6299db9ec823/starlark-0.8.0/src/syntax/ast.rs:84:1
   |
84 | assert_eq_size!(AstStmt, [usize; 12]);
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: source type: `Spanned<StmtP<AstNoPayload>>` (704 bits)
   = note: target type: `[usize; 12]` (768 bits)
   = note: this error originates in the macro `assert_eq_size` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
  --> /Users/brandonbennett/.cargo/registry/src/github.com-1ecc6299db9ec823/starlark-0.8.0/src/syntax/ast.rs:85:1
   |
85 | assert_eq_size!(AstExpr, [usize; 9]);
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: source type: `Spanned<ExprP<AstNoPayload>>` (448 bits)
   = note: target type: `[usize; 9]` (576 bits)
   = note: this error originates in the macro `assert_eq_size` (in Nightly builds, run with -Z macro-backtrace for more info)

Some errors have detailed explanations: E0407, E0512.
For more information about an error, try `rustc --explain E0407`.
error: could not compile `starlark` due to 3 previous errors

@Mythra Mythra reopened this Jan 20, 2023
@stepancheg
Copy link
Contributor

It is compiles in-repo.

To compile it outside of repo, path to gazebo library also needs to overridden to github master version.

I'm working on removal of gazebo dependencies now (this commit for example c43b8a6).

@nemith
Copy link

nemith commented Jan 20, 2023

Confirmed that building from main branch does compile.

starlark = { git = "https://github.com/facebookexperimental/starlark-rust" }

So maybe a new release at some point would be all that is needed. Cheers.

@ndmitchell
Copy link
Contributor

I'll make a new release shortly.

@ndmitchell
Copy link
Contributor

ndmitchell commented Jan 18, 2024

Actually, there was a release last year, so already done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants