Skip to content

Support for hooks (viewing only) #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jan 13, 2019
Merged

Support for hooks (viewing only) #95

merged 23 commits into from
Jan 13, 2019

Conversation

lemarcuspoilus
Copy link
Contributor

This PR adds three new functions and a few other types to safely wrap the lua_sethook function, as suggested by the issue #81.

This is my best attempt to make it possible to consult information for debugging purposes and to control script execution.

These functions have been written:

  • set_hook and set_mut_hook: Sets a function to be called when the hook is raised. It gets the new Debug struct containing information related to the code and function (depending on the hook mask)
  • remove_hook: Removes the hook previously set

A few new types such as Debug and HookOptions, written inside a new module, are used as parameters to those functions above. That module also contains a procedure given to lua_sethook used to call the Rust callback safely.

I regard this change as safe since it does not expose functions that changes the Lua stack or anything else (other than using the Lua structure itself). Moreover, since the hook can return a Result, it can terminate the execution of a script, as shown in a test, by returning the error back to the caller of exec and such. This satisfies the desire of the author as noted in the README.

Anyone is welcome to comment or suggest changes to make on my PR, since I did it without knowing too much about Lua's internal workings. I have not changed the README or version; I leave this to rlua's author discretion.

Thank you.

Copy link
Contributor

@jonas-schievink jonas-schievink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! I left some comments.

pub is_vararg: bool,
pub is_tailcall: bool,
pub short_src: Option<Cow<'a, str>>
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a private field here to allow rlua to add more fields without breaking users

src/hook.rs Outdated
pub after_counts: bool,
/// Specify how many instructions to execute before calling the hook. Only effective when
/// `after_counts` is set to true.
pub count: u32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to turn after_counts into an Option<u32> instead of ignoring this field when after_counts is false

/// });
/// let _ = lua.exec::<_, ()>(code, None);
/// # }
/// ```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice docs!

/// # extern crate rlua;
/// # use rlua::{Lua, HookOptions, Debug};
/// # fn main() {
/// let code = r#"local x = 2 + 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultra-nitpick: try putting a newline after the r#" and indent the Lua code to make it look a bit nicer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not since Lua would report it being on the 2nd line (2-3-4 would be outputted) and that looked confusing to me. Might change it to something else.

src/lua.rs Outdated
/// function. The behavior is the same otherwise.
///
/// [`set_hook`]: #method.set_hook
pub fn set_mut_hook<F>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method is redundant, since every closure that implements Fn also implements FnMut, so just having one set_hook that takes an FnMut should be enough.

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 8, 2018

I applied the changes you've recommended to me with some extras of my own. I have no idea why some of them didn't come to mind as I wrote the code!

While tweaking the hook code, I realized I could write the execution time limit code right within the Lua struct and the hook_proc function. What do you think of that? As long as the hook triggers are set to call the hook on a regular basis, this could work.

EDIT: It would look like this:

lua.set_exec_time_limit(Some(Duration::from_secs(10)));

src/hook.rs Outdated

let cb = extra.hook_callback
.as_mut()
.expect("no hooks previously set; this is a bug");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should probably be an rlua_expect macro inside the library that prints consistent messages to match rlua_assert and rlua_abort and that rlua_expect should be used here and also everywhere that currently .unwrap() is used. I also like that you say "this is a bug", and I think that all of the error macros in macros.rs should also say that.

@kyren
Copy link
Contributor

kyren commented Oct 8, 2018

Oh wow, this PR is great, thank you so much!

I agree with all the feedback @jonas-schievink gave so thank you for taking the time to review this before I had a chance to 👍

The only feedback I have to add atm is a super nit about using expect that I added some comments for.

@kyren
Copy link
Contributor

kyren commented Oct 8, 2018

You're free to either add that rlua_expect macro that I mentioned and change the error messages, or since that's more of a style concern you could also just leave that to me to change, either way.

Other than that I don't really see any reason right now why I shouldn't merge this, this is really good work!

/// After a certain amount of instructions. When set to `Some(count)`, `count` is the number of
/// instructions to execute before calling the hook.
pub after_counts: Option<u32>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super bikeshedding, but would it be possible to discuss these names a bit? I'd prefer to drop the plurality on all of them (which I'm aware makes returns into a keyword), but to avoid that what about something like: on_call, on_return, line or every_line or each_line or something, and after_count? I think after_counts particularly sounds strange.

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 9, 2018

Hello! If you don't mind, I'd like this PR to stay focused on the hook code; by that I mean I wouldn't add the rlua_expect! just yet so I don't start making changes everywhere ;) I think it would be a better idea to make a LuaExpect trait or something like that which adds the rlua_expect method to Result. It would result in a very similar, clean API like expect.

As of the names, I did feel that after_counts was slightly odd. I originally wanted them to sound like trigger on returns, trigger on lines (hmmm maybe not that one) and so on. Since the hook is called repeatedly, I think returns suggests it is for many returns and not just for a specific return somewhere. I'm still not sure about how I could replace after_counts with something more explanatory, yet short. What about every_nth_instructions? (not sure about the s)

I appreciate the positive feedback about the quality of the PR <3 at the same time I feel it's an expected, implicit standard.

@kyren
Copy link
Contributor

kyren commented Oct 9, 2018

Hello! If you don't mind, I'd like this PR to stay focused on the hook code; by that I mean I wouldn't add the rlua_expect! just yet so I don't start making changes everywhere ;) I think it would be a better idea to make a LuaExpect trait or something like that which adds the rlua_expect method to Result. It would result in a very similar, clean API like expect.

Yeah that makes complete sense, you don't have to do it, it was mostly that just seeing it made me think about the fact that it needs to be done.

As of the names, I did feel that after_counts was slightly odd. I originally wanted them to sound like trigger on returns, trigger on lines (hmmm maybe not that one) and so on. Since the hook is called repeatedly, I think returns suggests it is for many returns and not just for a specific return somewhere. I'm still not sure about how I could replace after_counts with something more explanatory, yet short. What about every_nth_instructions? (not sure about the s)

Okay, yeah I can see what you mean about implying that the hook is called repeatedly, on_returns meaning "on every return, which there will be multiple of". How about on_calls on_returns, every_line, every_nth_instruction?

I appreciate the positive feedback about the quality of the PR <3 at the same time I feel it's an expected, implicit standard.

Just to make sure I understand, you mean it's an expected implicit standard that the PRs be good haha? I suppose so, but I'd still like to acknowledge that you met it!

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 9, 2018

Just mixed the plurality currently being used with your suggestion. I think they all match pretty well!
There are a few things on top of my head that could be interesting to add before merging:

  • I just remembered there is a Scope structure there. Hooks look like a good fit since, as least me, I'd use them only for some specific code in most cases
  • As I already suggested, would adding time limitation right inside the crate add too much overhead/weight to the crate? I don't think many users would match both normal debugging and time limitation together, but that would imply adding a more user-friendly TimedOut error instead of using ExternalError or RuntimeError.

As of your reply:

Yeah that makes complete sense, you don't have to do it, it was mostly that just seeing it made me think about the fact that it needs to be done.

I'm interested in making a new separated PR for this purpose when this one is completed!

Okay, yeah I can see what you mean about implying that the hook is called repeatedly, on_returns meaning "on every return, which there will be multiple of". How about on_calls on_returns, every_line, every_nth_instruction?

every_line looks better than each_line for sure. I'd like to hear from @jonas-schievink about those names before making a final commit.

Just to make sure I understand, you mean it's an expected implicit standard that the PRs be good haha? I suppose so, but I'd still like to acknowledge that you met it!

That's very kind, and you got it :)

let _ = lua.exec::<_, ()>(code, None).expect_err("timeout didn't occur");
assert!(start.elapsed() < Duration::from_millis(750));
//println!("{}", lua.globals().get::<_, i64>("x").unwrap());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a system pause of more than 250ms, that could cause this test to fail, but I don't know how big of a deal that really is though. Could it instead be a test based on passing an instruction count limit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that's rather realistic to happen at least on Travis CI

@kyren
Copy link
Contributor

kyren commented Oct 9, 2018

I just remembered there is a Scope structure there. Hooks look like a good fit since, as least me, I'd use them only for some specific code in most cases

Scoped hooks would allow you to expire the hook in a natural way, and would allow for hooks that aren't 'static, but I think adding it would be a bit weird without support for multiple hook functions, since adding a scoped callback would remove any global one. I think it might be very tricky to support multiple hook functions as well, and I'm not sure how much of a use non-'static hook functions would have? Maybe I'm not imagining the use cases very well!

As I already suggested, would adding time limitation right inside the crate add too much overhead/weight to the crate? I don't think many users would match both normal debugging and time limitation together, but that would imply adding a more user-friendly TimedOut error instead of using ExternalError or RuntimeError.

I don't like super like adding timeouts on wall clock time or doing anything that gives the user the impression that there's more safety than there really is. There is an obvious implementation of such a function with respect to Lua::set_hook, but the obvious implementation has sort of fiendishly subtle real world behavior. If you set an execution timeout on X seconds there's no guarantee that you will trigger the error within X seconds because Lua is allowed to do so much. I also don't like that calling Lua::set_exec_time_limit would have to clear any existing hook. I think if you let the user write the simple obvious implementation then they're responsible for the subtle behavior that it actually has :)

@kyren
Copy link
Contributor

kyren commented Oct 9, 2018

every_line looks better than each_line for sure. I'd like to hear from @jonas-schievink about those names before making a final commit.

Yeah, agreed.

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 9, 2018

I don't like super like adding timeouts on wall clock time or doing anything that gives the user the impression that there's more safety than there really is. There is an obvious implementation of such a function with respect to Lua::set_hook, but the obvious implementation has sort of fiendishly subtle real world behavior. If you set an execution timeout on X seconds there's no guarantee that you will trigger the error within X seconds because Lua is allowed to do so much. I also don't like that calling Lua::set_exec_time_limit would have to clear any existing hook. I think if you let the user write the simple obvious implementation then they're responsible for the subtle behavior that it actually has :)

It's true that the precision would be very low if the time between hooks is large. I agree this would be a bad idea.

Scoped hooks would allow you to expire the hook in a natural way, and would allow for hooks that aren't 'static, but I think adding it would be a bit weird without support for multiple hook functions, since adding a scoped callback would remove any global one. I think it might be very tricky to support multiple hook functions as well, and I'm not sure how much of a use non-'static hook functions would have? Maybe I'm not imagining the use cases very well!

I'm especially interested for hooks that expires because I think it might look more "elegant" in some way to not use remove_hook after only one Lua::exec call? Not a very compelling argument. When the scope expires, the original hook would get restored.

The rather large problem I see with using multiple hooks is that Lua only allows one hook to be set at one given time and only one mask for it. Since Lua::set_hook forwards the triggers to lua_sethook, that means we'd need to find the lowest common denominator triggers for all current hooks, then for every call of the hook, check which functions are supposed to be called. Here's some steps of what I mean:

  • Hook A gets called every 10th instruction and on function calls
  • Hook B gets called every 5th instruction
  • Hook C gets called every line

This means the lowest common denominator would be: every line, every 5h instruction, function calls. Hook C would somehow need to be called when a line change is detected (possible through the Debug::curr_line field), Hook B every 5th invocation of the callback, and Hook A every other time Hook B is elligible for the call and on function calls. It's totally possible! But I'm worried about all the overhead this would have, especially if every_nth_instruction is set to 1.

But don't get me wrong, I'm willing to try to make this happen if you agree!

@kyren
Copy link
Contributor

kyren commented Oct 9, 2018

It's true that the precision would be very low if the time between hooks is large. I agree this would be a bad idea.

It's worse than that in the presence of 1) other callbacks, 2) lack of memory limits, 3) unsafe lua stdlib functions, etc etc etc... I'm trying to be ultra careful about not giving people a false sense of security.

I'm especially interested for hooks that expires because I think it might look more "elegant" in some way to not use remove_hook after only one Lua::exec call? Not a very compelling argument.

Yeah, I agree that having hooks expire based on a handle is in some way nicer, I just don't like how it interacts with being limited to a single global hook.

So yeah, you could add support for multiple hooks, but doing so would have surprising and weird or deeply magical behavior so I didn't want to suggest that you do it. I don't think it's a good place to start, at least, though we may want to look at it later.

The rather large problem I see with using multiple hooks is that Lua only allows one hook to be set at one given time and only one mask for it. Since Lua::set_hook forwards the triggers to lua_sethook, that means we'd need to find the lowest common denominator triggers for all current hooks, then for every call of the hook, check which functions are supposed to be called. Here's some steps of what I mean:

The immediate thing I thought about for implementing multiple hooks based on a single hook is exactly what you mentioned about having multiple Nth instruction hooks. What if you have two large Ns which are coprime, do you have to then have a hook on every instruction? (Edit: I guess you were thinking along the same lines because you mentioned the overhead of every_nth_instruction being set to 1) Do you do something magical like limit the instruction granularity to some fixed amount? I couldn't come up with anything that made sense to me to suggest, so I just didn't suggest that we do it... at least yet.

I think the simplest possible implementation of a single global callback hook is okay for now, because it's really clear what the translation to the Lua C API actually is, and it doesn't imply any safety that's not really there.

/// Before a function call.
pub on_calls: bool,
/// When Lua returns from a function.
pub on_returns: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to document that this does not include returns from tail-called functions:

For call events, event can be LUA_HOOKCALL, the normal value, or LUA_HOOKTAILCALL, for a tail call; in this case, there will be no corresponding return event.

@jonas-schievink
Copy link
Contributor

One more thing worth testing for might be that the hooks are always dropped when set_hook or remove_hook is called.

Regarding the hook trigger names, I think I like @kyren's last suggestion most:

on_calls, on_returns, every_line, every_nth_instruction

Although I also wouldn't mind on_call and on_return instead - the docs should make it immediately clear that this means every call and every return (except returns from tail-called functions, as I noted above).

I also think that timeouts belong in user code, since it would complicate rlua's internals quite a bit and add lots of assumptions. Execution time also doesn't always seem like a desirable thing to limit because it's very unpredictable, so instruction limits might be preferable.

Regarding support for multiple hooks: This also seems like something we could reasonably expect a user to implement themselves if they really want it, as it makes the implementation in rlua pretty complicated. A user would know exactly when the "base hook" needs to be called and can dispatch from there.

@kyren
Copy link
Contributor

kyren commented Oct 9, 2018

After thinking about it for a while, I think the current hook code is unsound, but it's hopefully pretty easy to fix.

I think the problem lies in the fact that hooks are FnMut, and by using something like Rc I believe I can get mutable aliasing in the hook function. We could either have just Fn hook types or maybe since it's so incredibly rare to recurse inside a hook, we could guard it with a RefCell. Even though it's rare, I'd lean towards just using Fn because otherwise technically it should be documented that callback hooks use RefCell and it's yet more strange corner case behavior to have to worry about.

As extra paranoia, there's a &mut ExtraData reference in hook_proc that should probably just be replaced with something like:

        let cb = (*extra_data(self.state)).hook_callback
             .as_ref()
             .expect("rlua internal error: no hooks previously set; this is a bug");
         cb(&debug)

I know constantly doing (*extra_data(self.state)) is dumb, but I'm also a bit paranoid about even technically violating mutable aliasing rules (so keeping &mut references alive for as little time is possible is preferred, or never even creating them).

So, actually that would leave us with another soundness issue, which is that a hook can call Lua::remove_hook inside itself and remove its own storage while executing, but this is just another way of describing another mutable aliasing violation. Basically, it's just too dangerous to keep a reference to ExtraData around across a callback as it's accessed everywhere, even a reference to just a field of ExtraData.

I propose that we make the hook_callback field into Option<Rc<Fn(&Debug) -> Result<()>>>, allowing us to avoid any mutable aliasing violations, leaving us with something like:

        let cb = (*extra_data(self.state))
            .hook_callback
            .expect("rlua internal error: no hooks previously set; this is a bug")
            .clone();
         cb(&debug)

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 10, 2018

Hello, wow lots to read!

It might be a good idea to document that this does not include returns from tail-called functions:
For call events, event can be LUA_HOOKCALL, the normal value, or LUA_HOOKTAILCALL, for a tail call; in this case, there will be no corresponding return event.

The Lua's documentation looks confusing to me: what "in this case" refers to? The latter? Then we have the hook being calls for returns even on tail calls. At least that's what I understand.

I did not add it since the list of event masks visible in lua.h:412-415 does not include the tail version and only LUA_MASKCALL. I eventually forgot about that detail. I should probably check if the behaviour would change.

One more thing worth testing for might be that the hooks are always dropped when set_hook or remove_hook is called.

I shall add a test for a second call of set_hook, but remove_hook is covered.

Regarding the hook trigger names, I think I like @kyren's last suggestion most:

Noted, that's what I'll put for the names. Maybe clarify the doc too, especially about overheads.

Although I also wouldn't mind on_call and on_return instead - the docs should make it immediately clear that this means every call and every return (except returns from tail-called functions, as I noted above).

I also think that timeouts belong in user code, since it would complicate rlua's internals quite a bit and add lots of assumptions. Execution time also doesn't always seem like a desirable thing to limit because it's very unpredictable, so instruction limits might be preferable.

Regarding support for multiple hooks: This also seems like something we could reasonably expect a user to implement themselves if they really want it, as it makes the implementation in rlua pretty complicated. A user would know exactly when the "base hook" needs to be called and can dispatch from there.

It's very cool that the user is not dependent on rlua's decision about it, so I wouldn't mind if we keep it the way it is now, which is nothing more than a fancy wrapper. Instruction limit is also a possibility thought depending on every_nth_instruction, the precision would vary.

I am still not against for such a system, but maybe inside some kind of module or even a separate crate. We/I could use a trait to add extra methods for that. You know, I like convenience. :) That being said I'm cool going your route.

So, actually that would leave us with another soundness issue, which is that a hook can call Lua::remove_hook inside itself and remove its own storage while executing, but this is just another way of describing another mutable aliasing violation. Basically, it's just too dangerous to keep a reference to ExtraData around across a callback as it's accessed everywhere, even a reference to just a field of ExtraData.

I propose that we make the hook_callback field into Option<Rc<Fn(&Debug)>>, allowing us to avoid any mutable aliasing violations, leaving us with something like

You know, I might be a foolish person but I'm chill working with raw pointers, mem::transmute and such, I semi-don't understand your worries about having a &mut reference. :D

What I do share is the safety concerns of remove_hook within a hook. I saw that too but I did not realise it would deallocate the memory being used by the function. I'm going to wrap it inside a Rc. As for usign Fn over FnMut, I'd prefer to keep the mutability advantage that comes with FnMut but that would require the RefCell you've mentioned. As for the rest, hmmm I did not catch it quite but I trust you. I appreciate the care you both put in the crate to ensure it's basically 100% stable for even the noobest user (which might be me!).

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 10, 2018

Just tried to call remove_hook within a hook but Rust outputs this error message:

error[E0277]: `*mut rlua::ffi::lua_State` cannot be shared between threads safely
   --> tests\hooks.rs:120:9
    |
120 |     lua.set_hook(HookTriggers {
    |         ^^^^^^^^ `*mut rlua::ffi::lua_State` cannot be shared between threads safely
    |
    = help: within `rlua::Lua`, the trait `std::marker::Sync` is not implemented for `*mut rlua::ffi::lua_State`
    = note: required because it appears within the type `rlua::Lua`
    = note: required because of the requirements on the impl of `std::marker::Send` for `&rlua::Lua`
    = note: required because it appears within the type `[closure@tests\hooks.rs:122:8: 126:6 lua:&rlua::Lua, call_count:&mut i32]`

I'm using this code:

    lua.set_hook(HookTriggers {
        every_line: true, ..Default::default()
    }, |_debug: &Debug| {
        lua.remove_hook();
        call_count += 1;
        Ok(())
    });

I might have made a mistake but that technically acts as a safety protection without the need to use Rc<RefCell<...>>. Did I missed something?

(*extra_data(self.state)).hook_callback = None;
ffi::lua_sethook(self.state, None, 0, 0);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hook callback function should take a &Lua otherwise this is unsound.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't remember if I did the change. Might have forgotten to push. I passed it by value since I thought it would not be used anyway in hook_proc.

@kyren
Copy link
Contributor

kyren commented Oct 12, 2018

I added some more comments for things I think still need fixing.

I think I jumped the gun a bit at the beginning when I first looked at the PR, I need to learn to slow down and fully carefully review as the very first thing I do, rather than trying to skim, especially when it comes to unsafe code.

Like everything around Lua, there was more subtlety around hooks than I had appreciated at first.

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 12, 2018

I don't like that a method called ptr_to_str is doing trimming, it shouldn't be modifying the strings that Lua returns. Also, I didn't think about the fact that str::from_utf8 is fallible here, so now I'm not sure I like that the Debug structure returns str rather than [u8]. I was recently reminded that we shouldn't make the assumption that Lua strings are utf8 and had gotten into the habit of doing that in too many places.

It's more complicated than that, here's why:

  • Depending on where the hook occurs, some strings are set to full zeros because they don't need to have the information.
  • It's failable mostly because str requires it to, I did not take the time to look for a ASCII version. I preferred to avoid allocations
  • Trimming is done because in some cases, the string contained \u00 and similar characters at the right end. This just removes them so the string is clean
  • &str has the advantage it's convenient to use rather than an extra [u8] conversion, at least if you're comparing byte strings.

I don't mind these elements, at least in a controlled environment, but you are the one best placed to know how we should improve this. If it was me I'd be comfortable publishing this in a version.

I'll make the changes soon when I got time.

@kyren
Copy link
Contributor

kyren commented Oct 12, 2018

Trimming is done because in some cases, the string contained \u00 and similar characters at the right end. This just removes them so the string is clean

Trim removes all whitespace not just zeroes, there are other ways to get str out of [u8] that stops at the first zero that don't involve trimming whitespace.

It's failable mostly because str requires it to, I did not take the time to look for a ASCII version. I preferred to avoid allocations

I'm not suggesting allocation, I'm suggesting &[u8] with a set of convenient methods to go to &str, like LuaString has. I'm not sure exactly what it should look like yet.

&str has the advantage it's convenient to use rather than an extra [u8] conversion, at least if you're comparing byte strings.

I know str is convenient, but Lua source code is not utf8. A user should at least have the ability to discern between "Lua did not provide the 'source' string" and "The Lua provided 'source' string is not valid utf8"

@kyren
Copy link
Contributor

kyren commented Oct 12, 2018

Ehhhh, after reading the Lua source code for the implementation of lua_getinfo, I realize that hard-coding the "nSltu" parameter is probably a bad idea, because it will be very very costly to do high frequency hooks, more so than I realized at first. Probably what should happen is that specifying the set of required info should be specified along with the hook in Lua::set_hook OR the hook itself should have to call an auxiliary function to specify what information it needs. The first one is vastly simpler but the second one is more flexible, and also how the normal Lua C API works (probably there is a good reason for this, as well).

Okay, I know that I keep finding things and this is turning into a lot of work for you, and I don't really like that I'm putting so much work onto you. This is the biggest PR so far to rlua and it's for a very core feature and it's becoming a bit unwieldy.

Is there a way that I can do PRs for your PR? I want to help with all of these features rather than just tell you what to do, as this was something I had fully planned on doing anyway. Since it's such a core feature, I'm also discovering that the more I think about it the more and more picky I'm becoming. It would be easier if I could collaborate with code as well as just textual feedback.

BTW I'm not like exactly an expert at open source processes, I'm willing to take suggestions here about how to make things easier.

@lemarcuspoilus
Copy link
Contributor Author

Hello, it's true that you care more and more about the underlying code, but I don't mind. The only downside is that I have other things to do so my progress could be a bit slower than my usual 1-day response.

That being said, feel totally free to make PRs in my PR! Since I'm doing commits directly on my hooks branch, you might accidentaly pull invalid code. I don't know how Github fully works but I'll be happy to add write access to your account!

I need to do &Lua, new_ephermal thing and your suggestion for the lua_getinfo's flags. I suggest you work on the &str replacement!

I'll do the stuff later.

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 12, 2018

Actually this PR deserves its own checklist because I'm not even sure what needs to be done :)
I have no idea if you can edit my comment but here's a quick checklist:

  • Mutable hook functions
  • "lazy" Debug structure to get information
  • better way to handle Lua strings
  • Validate doc and final stuff
  • insert something here

@kyren
Copy link
Contributor

kyren commented Oct 13, 2018

Couple of quick things:

I feel much better about the Debug struct now and getting rid of the hard coded parameter to lua_getinfo after your recent changes, this is a lot more reasonable to run with high frequency hooks now and the API seems very reasonable 👍

Names, Source, and Stack are really generic names, can those be something like DebugNames, DebugSource, and DebugStack? I'm not picky, and they can be in a submodule if you think that's better but they're a bit too generic to be exported as rlua::Names.

There's sort of a proliferation of pub(crate), and I feel a bit bad about having something like make_ephemeral be pub(crate), can hook_proc just be moved into lua.rs? I know lua.rs is huge and is a dumping ground, it's a symptom of the crate not having great organization atm, but I don't really want to tackle that right now. I think doing that removes the need for quite a few pub(crate) additions, but I might be missing something.

I think if you push again you'll probably fail travis, because I just recently added rustfmt checks on stable on travis. I know it's annoying, I'm sorry :(

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 14, 2018

Hello!

Names, Source, and Stack are really generic names, can those be something like DebugNames, DebugSource, and DebugStack? I'm not picky, and they can be in a submodule if you think that's better but they're a bit too generic to be exported as rlua::Names.

I've heard it was better to avoid prefixing those structures so I left them as-is. And I also preferred to not have to type the submodule before the struct, I find it not as clean. What we can do is re-export them in a submodule in lib, if that's feasible.

There's sort of a proliferation of pub(crate), and I feel a bit bad about having something like make_ephemeral be pub(crate), can hook_proc just be moved into lua.rs? I know lua.rs is huge and is a dumping ground, it's a symptom of the crate not having great organization atm, but I don't really want to tackle that right now. I think doing that removes the need for quite a few pub(crate) additions, but I might be missing something.

I put it there just to make the code more "organized". I don't mind the change but I also like the use of pub(crate) in some places (thought you're right I put a lot of them while making the PR). Do what you feel is best of the developers.

At the same time I think at some point it should go back as it feels kinda right at home, right next to the other structures that it directly accesses.

To make the crate more organized it would require many, many changes everywhere, wrapping some fields into structures and such. Definitively a large task, but theoretically we could also ignore all of that for now. The user won't see it anyway, the public API just needs to be nice.

I think if you push again you'll probably fail travis, because I just recently added rustfmt checks on stable on travis. I know it's annoying, I'm sorry :(

It's actually more for you. You know, I like to show a PR with a nice green checkbox right next to it. But to be honest it actually caught one failed build that saved me from letting a bad build stay for too long. I'll keep that in mind.

Btw I'm also sorry for my reduced activity, I have to program on something else before bringing back my attention. Fortunately those are minor details!

@kyren
Copy link
Contributor

kyren commented Oct 14, 2018

I've heard it was better to avoid prefixing those structures so I left them as-is. And I also preferred to not have to type the submodule before the struct, I find it not as clean. What we can do is re-export them in a submodule in lib, if that's feasible.

Well, you could have pub mod hook and then only export HookTriggers at the top level, that would be okay I suppose, it would just be the first exported submodule.

I put it there just to make the code more "organized". I don't mind the change but I also like the use of pub(crate) in some places (thought you're right I put a lot of them while making the PR). Do what you feel is best of the developers.

I understand, I'm just trying to put off solving the problem and then solve it "properly", but I don't know what that looks like yet.

It's actually more for you. You know, I like to show a PR with a nice green checkbox right next to it. But to be honest it actually caught one failed build that saved me from letting a bad build stay for too long. I'll keep that in mind.

What I was trying to say was that before merging you're going to have to format everything with stable rustfmt, hence it would be mildly annoying for you.

Btw I'm also sorry for my reduced activity, I have to program on something else before bringing back my attention. Fortunately those are minor details!

It's fine, you can work on this at your own pace, there's not a deadline, I'm not going to even release a new version of this crate for a while.

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 15, 2018

Hello! The suggestions are noted! So actually Travis will check for how I wrote the code? Hmmm I'll satisfy it, hopefully he's not too strict!

EDIT: Just tried rustfmt lua.rs from rustfmt-preview (where is stable?) and it gives me interesting changes, including code of yours. Take for instance:

            ffi::LUA_TNUMBER => if ffi::lua_isinteger(self.state, -1) != 0 {
                let i = Value::Integer(ffi::lua_tointeger(self.state, -1));
                ffi::lua_pop(self.state, 1);
                i
            } else {
                let n = Value::Number(ffi::lua_tonumber(self.state, -1));
                ffi::lua_pop(self.state, 1);
                n
            },

After the command I get:

            ffi::LUA_TNUMBER => {
                if ffi::lua_isinteger(self.state, -1) != 0 {
                    let i = Value::Integer(ffi::lua_tointeger(self.state, -1));
                    ffi::lua_pop(self.state, 1);
                    i
                } else {
                    let n = Value::Number(ffi::lua_tonumber(self.state, -1));
                    ffi::lua_pop(self.state, 1);
                    n
                }
            }

I don't see a format configuration file or something like that either. Technically that means the whole crate would need to be formatted somehow... I might be exaggerating things.

@kyren
Copy link
Contributor

kyren commented Oct 15, 2018

Hello! The suggestions are noted! So actually Travis will check for how I wrote the code? Hmmm I'll satisfy it, hopefully he's not too strict!

It's extremely strict, that's kind of the point. Don't think about formatting, just run the tool.

EDIT: Just tried rustfmt lua.rs from rustfmt-preview (where is stable?) and it gives me interesting changes, including code of yours. Take for instance:

I'm not sure why, cargo fmt using the stable toolchain doesn't make any changes to the source right now for me.

I don't see a format configuration file or something like that either. Technically that means the whole crate would need to be formatted somehow... I might be exaggerating things.

That's exactly how it works, afaik you can just install the rustfmt-preview component and type cargo fmt. It's always been formatted that way, I just never had a travis check until now.

I know rustfmt is still in preview, but doing things this way is cleaner because if you didn't do it, I was just going to do it to the code you submitted for PR the moment I accepted it anyway.

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 15, 2018

Never knew cargo fmt existed, I might try that too. I get way too many changes on the preview version. I was using a nightly toolchain, however. Just checked Travis log and normally that means it should be a really easy step to do.

EDIT: Hmmm I'm getting This version of rustfmt is deprecated. Use rustfmt-nightly. Use --force to run deprecated rustfmt even thought the Travis log says nothing... I'm running same Rust version (stable 1.29.1), I guess I'll just force it. EDIT 2: Line limit becomes an issue. Interesting... It works on the nightly toolchain however, could try it again later.

@kyren
Copy link
Contributor

kyren commented Oct 15, 2018

EDIT: Hmmm I'm getting This version of rustfmt is deprecated. Use rustfmt-nightly. Use --force to run deprecated rustfmt even thought the Travis log says nothing... I'm running same Rust version (stable 1.29.1), I guess I'll just force it.

I don't know what version of rustfmt you're using but I believe it is an extremely old version, here is what mine prints:

 % rustfmt --version                                                                                                                                                                         
rustfmt 0.99.1-stable (da17b689 2018-08-04)

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 16, 2018

Wow, I'm using 0.10.0 ( ) DEPRECATED: use rustfmt-nightly...
I discovered that, even with the tool uninstalled, it's still available on the command line. I have no idea how it got installed, I guess I'd need to remove it first so I can get a good version.

EDIT: I manually deleted rustfmt from my cargo directory and now it looks much more promising. I should be ready to make a commit soon with formatted code.

@lemarcuspoilus
Copy link
Contributor Author

Hello, I don't know if you got stuck too, but running cargo fmt --all yields this error (I have the same rustfmt version as you):

internal error: left behind trailing whitespace
   --> \\?\E:\rlua\src\conversion.rs:238

From some quick searches, this looks like a known bug. I'm not sure how I should tackle this (other than trying the nightly version) without making too many changes to the PR.

By the way, how is the string converter thing doing? I believe we could consider all characters to be ASCII, but that would be limited. As I understood it, it depends on the encoding you're using on the running system. I assume Rust's UTF-8 strings will work just fine but strings returned from a file won't be parsed properly... Would you like me to work on a basic foundation?

@kyren
Copy link
Contributor

kyren commented Oct 18, 2018

Hello, I don't know if you got stuck too, but running cargo fmt --all yields this error (I have the same rustfmt version as you):

I'm not stuck, I was just waiting for you to push a rustfmt update before pushing to your PR. It's dumb, but lean so heavily on rustfmt that it's mildly painful to edit source where the auto formatting causes edits that I didn't intend. Normally I would just turn the auto formatting off or manually invoke it for only my edited sections, but hey, this my crate right so I added rustfmt checks to travis instead :D

internal error: left behind trailing whitespace
--> \?\E:\rlua\src\conversion.rs:238

What version of rustfmt are you using, I just tried it on your branch and it works fine? You didn't even touch conversion.rs, so I'm confused. Can you tell me what rustfmt --version at the command line prints?

By the way, how is the string converter thing doing? I believe we could consider all characters to be ASCII, but that would be limited. As I understood it, it depends on the encoding you're using on the running system. I assume Rust's UTF-8 strings will work just fine but strings returned from a file won't be parsed properly... Would you like me to work on a basic foundation?

[u8] is not ASCII, it doesn't specify any encoding, it's just plain bytes. Lua rules around strings (and identifiers, etc, just code in general) are simplistic, you can shove arbitrary bytes in them and it's not based on the system locale or anything at all. You can write valid Lua that contains non-utf8 strings and then it's a problem when rust code assumes all strings are utf8. If all Lua strings were all ASCII that would be fine, because all ASCII is valid utf8 by design. There is no parsing involved. You can work on it if you want, but you need to understand what the problem is exactly first.

@lemarcuspoilus
Copy link
Contributor Author

Hello!

What version of rustfmt are you using, I just tried it on your branch and it works fine? You didn't even touch conversion.rs, so I'm confused. Can you tell me what rustfmt --version at the command line prints?

I'm sorry I was actually not pushing anything. I actually have two separate installations with different behaviors. I'm going to try with rustfmt 0.99.1-stable (da17b689 2018-08-04) on the latest stable Rust. Hmmm got the same error. I think I was using nightly on the other station, which was a success but had massive changes everywhere. I have no idea how to fix it unless going full nightly.

For the stable version, I'm using these commands:

rustup default stable
rustup component add rustfmt-preview
cargo fmt -- --version

That's how I'm getting a few internal error: left behind trailing whitespace.

[u8] is not ASCII, it doesn't specify any encoding, it's just plain bytes. Lua rules around strings (and identifiers, etc, just code in general) are simplistic, you can shove arbitrary bytes in them and it's not based on the system locale or anything at all. You can write valid Lua that contains non-utf8 strings and then it's a problem when rust code assumes all strings are utf8. If all Lua strings were all ASCII that would be fine, because all ASCII is valid utf8 by design. There is no parsing involved. You can work on it if you want, but you need to understand what the problem is exactly first.

Yes I've read about that strings are pretty much just byte arrays. Just got an idea, since normally it should just be some kind of text, could be rely on the existing String type of rlua? If so then the changes should be minimal.

Please let me know if you'd like a commit that uses the nightly version! I'll check soon for a reply, I've not been reliable these times.

@kyren
Copy link
Contributor

kyren commented Oct 20, 2018

That's how I'm getting a few internal error: left behind trailing whitespace.

Hmm, not sure why, but that apparently is only happening on your unpushed changes because it works on the pushed ones. Sorry for making you go through this, the code in your PR before this had a lot of mismatched style, but the whole point of something like rustfmt is not to get into style discussions, so I thought this way would be easier. Clearly it's still not easy though. Just push your changes and don't worry about the travis failure and then I can help you figure out why rustfmt is failing. At the very worst, I can fix up the PR myself either before or after merging.

Yes I've read about that strings are pretty much just byte arrays. Just got an idea, since normally it should just be some kind of text, could be rely on the existing String type of rlua? If so then the changes should be minimal.

No, you don't want to do that, because the strings that Lua gives you via the lua_Debug struct are not (or at least they're not presented as) normal internal lua strings, so that would involve copying them. I can fix this to not assume utf8 encoding before or after merging the PR, so just don't worry about it.

Please let me know if you'd like a commit that uses the nightly version! I'll check soon for a reply, I've not been reliable these times.

No, don't commit with nightly rustfmt, just commit and push so that I can see the error you're getting.

Also, responding within a day or two is perfectly fine, this isn't your job or anything, like I said there's not a rush.


So, this PR has been going on a while, and I'd like to propose a new strategy here. This is quite a large change, and there are a lot of very very small things I would like to add / change in it, wording that I would like to change, just lots of very small things, and also this is a feature that was on my list of things to do already, so I (very clearly) have a lot of all caps OPINIONS about how it should be done.

This crate has, up until this point, been basically run by just me. This is not to discount all the great help that I've gotten up until this point, especially with some tricky points around API design, errors, and especially documentation, but though other people have given me super awesome PRs, I don't actually develop that way, I just push to the crate as though it's my personal project, because it is.

I'm not actually ready to change that process either, because I don't think it's quite the right time for API changes on this crate to slow down, because it's not (in my opinion at least) "feature complete" yet. I don't know when that point will be, actually it might be pretty soon, but it's not there yet. In fact, there is still the very real possibility that I will solve some of the tricky problems around callback lifetimes, or Fn argument conversion, or just come up with some really much better way of expressing bindings that represents a fundamental shift, and I'll have to rewrite the whole interface.

Once the API settles down (again, it may be close to that or it may be far away, not sure), I'll think about transitioning to 1.0 and then maybe think about formalizing development a bit more. Until then, this is just my personal project, so I don't really have the energy to have everything go through a sort of RFC-lite like process, like some more stable projects do.

In light of that, what I propose is that I just simply merge your PR in whatever state you're comfortable, and then I make the changes to it I'd like to before releasing the next version of rlua. As long as the PR doesn't introduce unsoundness, I can iterate on the API until I'm happy with it, and if you see me do something that you don't like or would like to improve it in other ways, you can make additional PRs and we can discuss it.

Please don't take this as me saying that your PR is in some way "lacking" or "broken", and I'm "fixing" it, I simply want to iterate on it, which is exactly what I do with features that I write as well. This is turning into an epic RFC level (or, at least it feels like it to me) patch, and I'm not sure I'm ready for this level of formality and process.

If you don't like this plan and would like to retain more formal control over the feature that you write, and would like any changes to it to go through PR like discussion, I both completely understand that desire, but also am personally just not ready to give up that level of control yet. This has been a bit awkward for me so far because you seem very invested in this change, which is wonderful and that's a wonderful attitude to have, but I don't want you to give you the wrong impression about the process or lack thereof around this project at this time.

Even if accept your PR then iterate on it, that doesn't mean you haven't still saved me a lot of time and energy exploring the API design space, and that doesn't mean that I'm not extremely grateful.

@lemarcuspoilus
Copy link
Contributor Author

Hmm, not sure why, but that apparently is only happening on your unpushed changes because it works on the pushed ones. Sorry for making you go through this, the code in your PR before this had a lot of mismatched style, but the whole point of something like rustfmt is not to get into style discussions, so I thought this way would be easier. Clearly it's still not easy though. Just push your changes and don't worry about the travis failure and then I can help you figure out why rustfmt is failing. At the very worst, I can fix up the PR myself either before or after merging.

I do have some small minor style preferences, but I actually don't mind using rustfmt! It's the errors that prevent me from pushing a half-baked commit. As I said nightly works fine but means the PR is going to be a huge PR compatibility destroyer... thought you're the ownerso you do what you want :)

No, you don't want to do that, because the strings that Lua gives you via the lua_Debug struct are not (or at least they're not presented as) normal internal lua strings, so that would involve copying them. I can fix this to not assume utf8 encoding before or after merging the PR, so just don't worry about it.

I'll leave the code I've originally written as-is for now. I'm still curious to what your solution would actually be, if you can't know for sure how to convert it into a Rust string without failure on many platforms... I'll see when you push the changes.

No, don't commit with nightly rustfmt, just commit and push so that I can see the error you're getting.

I'm using the latest code on the hooks branch, same as you may clone on GitHub right now. I'm doing cargo fmt --all right on unmodified sources.


I read your confession and I totally agree with you! This is the very first time I actually made a PR in a pretty large project without actually knowing what I'm doing. At the same time I feel what you mean by wanting to control it; I'm the same. I like having a nice, neat API that I made and that I fully understand rather than working with someone not thinking the same way and causing inconsistencies.

This, indeed, is close to RFC-class commit (expect I don't have to work in a gigantic code base like it scares me big time) and as time went on I felt your need to tweak the code to your feelings. Those changes are probably more thought out than I would and, subsequently, would be much easier in individual commits.

I'm happy with the code I've written. I think it fits pretty well with the existing API and contains a decent feature set for hooks that should be enough for the average crate user. I never really know if I'll ever be 100% confident with my code but I don't see how to improve it either.

Now, I'd like to go through the paragraphs so I can better structure my thought process.

so I don't really have the energy to have everything go through a sort of RFC-lite like process, like some more stable projects do.

That discourages me from doing a second, minuscule, PR for another small feature, but I might try anyway. I don't know if there is a PM feature but if so, then I might discuss about it. Nonetheless more details about my motivations below.

Please don't take this as me saying that your PR is in some way "lacking" or "broken", and I'm "fixing" it, I simply want to iterate on it, which is exactly what I do with features that I write as well. This is turning into an epic RFC level (or, at least it feels like it to me) patch, and I'm not sure I'm ready for this level of formality and process.

I did not take it badly and I don't either :) I expect reviewers to have criticism with the goal to make the source code as best as possible (for the project) since I'll be used for quite some time and probably not changed (in a stable situation thought). My previous PR was accepted right away, no questions asked, which I actually didn't appreciate that much.

I do expect about 50% of the discussion of this PR in subsequent PRs because that's how I feel when a dude comes around with code to give free, but I do not require it either.

In light of that, what I propose is that I just simply merge your PR in whatever state you're comfortable, and then I make the changes to it I'd like to before releasing the next version of rlua. As long as the PR doesn't introduce unsoundness, I can iterate on the API until I'm happy with it, and if you see me do something that you don't like or would like to improve it in other ways, you can make additional PRs and we can discuss it.

If that wasn't obvious enough I accept you merge it right away (but not without a little comment :D). At the same time I don't see why I should "force" project owners to think my way when I do not own the project or I'm not part of the management group. I tried to mimic the public API that was already existent so the new code feels right at home.

If you don't like this plan and would like to retain more formal control over the feature that you write, and would like any changes to it to go through PR like discussion, I both completely understand that desire, but also am personally just not ready to give up that level of control yet. This has been a bit awkward for me so far because you seem very invested in this change, which is wonderful and that's a wonderful attitude to have, but I don't want you to give you the wrong impression about the process or lack thereof around this project at this time.

Two parts:

  • I feel I gave my fair two cents with the project by defending the FnMut choice for the hook function trait requirement and naming of the HookTriggers fields, and I can't see myself going much further than that, since I might shown lack of openness.
  • I actually have personal interests in using your crate for another of my crazy personal projects. I wanted to modify it to embed it in a nostd context but I remembered recently that it would be a bit harder than I thought. The hooks are there to control execution speed and the brief memory suggestion is also related to this. At the same time I didn't want those changes to not be public since they are not specific to such a project.

Thanks too :) I want to give a good impression to the people that I interact with since it makes it easier for me to keep working with them.

Even if accept your PR then iterate on it, that doesn't mean you haven't still saved me a lot of time and energy exploring the API design space, and that doesn't mean that I'm not extremely grateful.

I'm happy it'll help you go faster with tweaking the API rather than adding extra features! Still surprised, again, since I made it without even knowing how your API really works. I'm also glad that you were open to discuss with me for now an almost 2-week PR! Makes me want to stay here. :)

By the way, I looked up about GCs in general and I now understand why it's a bit harder than Arcs all over the place! Haven't looked up your code, I'm still experimenting with some code. Also very tempting to use C++ instead.

@kyren
Copy link
Contributor

kyren commented Oct 25, 2018

Hey, so I haven't forgotten about this PR, but I'm possibly about to break it horribly with invasive API changes to fix #97.

Since I was planning on doing some extra work on this when I merged this anyway, you don't have to worry about fixing it (unless you just really want to), I'll fix it up when I merge it. Just letting you know so you don't think I forgot about it.

Also just so you know, I can run cargo fmt just fine on your PR, and even did so and pushed it to a local hooks branch. I don't know at all why it's not working for you. Maybe there is some platform specific rustfmt bug and you and I are on different platforms?

@lemarcuspoilus
Copy link
Contributor Author

lemarcuspoilus commented Oct 25, 2018

Hello! Yes, go ahead and merge it before you have to do massive Git merging nonsense! :) About the formatting, I'm running on Windows 8.1 x64 and I'm editing the files with IntelliJ IDEA and its Rust plugin. I have not tested in a Fedora VM or similar. Unfortunately I don't know about such a bug, maybe a very old one, not quite sure... Not sure either if it's my OS installation getting nice and dirty.

I'm going to read #97. By the way, good luck making your changes! We might see each other quite soon with a smaller PR. ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants