-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Inline Parameters #7772
Comments
Hmm, interesting. I feel that the meaning of Let me see if I can fumble my way through this... So the difference is that with |
Yeah, but it's not just optimization, it's also semantic. The value from the call site is inlined into the function, and then the function is semantically analyzed. So it's possible to have code that behaves differently depending on whether the parameter is comptime known.
I think |
Edit: I misunderstood the intent of the proposal, so this comment is mostly off-topic. Couldn't the same thing be done with callsite inlining? For example, given
Thus, if the input This approach has the disadvantage of having to type |
@zzyxyzz Obligatory not OP. It is stated in the issue why this will not work. |
Ah, yes. I may have misread the issue. Let me try again: fn foo(inline x: i32) i32 {
// ... do something with x
return 0; // "inline" return value, because constant
}
fn bar(y: i32) {
const _ = foo(10); // non-inlined call to specialized function `foo_10()`,
// shared with other use-sites of `foo(10)`.
const z = foo(y); // Ordinary call to non-specialized function.
// Result is known at comptime, even when `foo()` is neither inlined nor eliminated.
}
Is that correct? @SpexGuy, @EleanorNB |
Unless I misunderstood something again, the inline parameter proposal is in essence a return to C inlining semantics: instead of being forced to inline a function or throw an error, the compiler is given discretion to inline, specialize or call the function normally. Only partly related to this is the inline return value proposal, which requires the compiler to analyze function bodies and see if the result is comptime-known independently of the arguments. Which would ordinarily almost never be the case, but in curried functions it might be. AFAICT, it is that last part that actually solves the original issue with It is also true that globally inlining a function is a much too heavy-handed way of peeling off a function boundary for the benefit of precomputation. What we really want is a way to tell the compiler that precomputation of a particular function is both important and likely to be successful. This is useful for ensuring that build configuration constants get propagated and all unnecessary code statically eliminated. It is also useful for math functions, simple string parsers and surely many other areas. In short, I'm proposing to scrap the inline parameter idea, because its behavior is too vague and there is not enough evidence for its usefulness. I'm also proposing to scrap the implicit "inline return" idea, because it is too expensive relative to the benefit it is likely to provide on an average function. Instead, I propose to have explicit precomputability annotations: fn isDarwin() inline bool { ... }
fn cos(x: f64) inline f64 { ... }
fn parseDate(str: const[] u8) inline ?Date { ... } A better keyword would be welcome, though. |
You're close, but the isDarwin case is not quite solved by inline returns alone. Sorry, I feel like I should have expanded more on this in the original post and included some examples. Let's take a detailed look at the behavior here. I'll use an explicit pub fn isDarwin(tag: Tag) inline bool {
return switch (tag) {
.ios, .macos, .watchos, .tvos => true,
else => false,
};
} Since the tag parameter is not marked pub fn isDarwin(inline tag: Tag) inline bool {
return switch (tag) {
.ios, .macos, .watchos, .tvos => true,
else => false,
};
} Because of this boundary, inline return values are not very useful without inline parameters. This also limits the amount of work the compiler needs to do. The pub fn applyTriangleOp(inline op: SomeBigEnum, start: u32, end: u32) u64 {
return switch (op) {
// using #7224 syntax to make this a smaller example, but this could be written out.
inline else => |ct_op| applyTriangleOpInner(ct_op, start, end),
};
}
pub fn applyTriangleOpInner(comptime op: SomeBigEnum, start: u32, end: u32) u64 {
var i = start;
while (i+1 < end) : (i += 1) {
var j = i+1;
while (j < end) : (j += 1) {
switch (op) {
// do something dependent on op
// for performance it's very important that op is comptime known
}
}
}
} Here, The Personally I also still feel like the extra work needed to make all returns implicitly inline may be a problem for compilation time, and maybe also in other ways. However, Andrew believes that this can be implemented in stage 2 without running into significant problems, so we've decided to go ahead with the experiment. If there are significant problems here, we are prepared to back it down to an explicit keyword or even remove it entirely. For more info on the potential problems we discussed, please check out the design meeting notes. |
@SpexGuy, Thanks for the explanation. Though if I may bother you with more questions...
If we have already given the compiler licence to try to compute a comptime result, why do we need to separately give it permission to determine which of the arguments are comptime at a particular call-site? How exactly the compiler goes about its business (re-analyze the function at every call site; produce multiple instantiations; memoize known results, etc.) should be an implementation detail. Or so it seems to me. I'm still trying to wrap my head around your second example. Not so much what it does, but where I would realistically want to use something like that. The use cases where neither a normal nor inline nor generic function is appropriate, but a lazily instantiated one is, seem to be targeted rather narrowly towards unrolled switches like that. |
Regardless of whether inline parameters are a good idea, I think that a lot of potential would be left on the table if the inline return feature is implicitly tied to it. For example, trig functions and other math are good candidates for having inline returns, so that |
By default, we have only given it permission to try to compute a comptime result assuming all arguments are not comptime-known (unless marked
The optimizer is certainly allowed to do this. The limit here is specifically for semantic analysis. It limits when the compiler is allowed to formally label the returned value as "comptime known", and e.g. use it as the length of an array type. If you instantiate a type inside the function, it determines how many versions of that type there are. These things are not implementation details, they are important to know when writing code because they change the meaning.
This is reasonable and should work. In this case the compiler should be able to recognize that the cos_0_7 function has no runtime code and not emit it, even in debug mode. Though if the implementation is just
I may have been unclear here as well. All functions are planned to have inline return values, not just functions with inline parameters. It's just that in most cases return values will not be comptime known unless some parameters are marked inline. The optimizer is free to inline the function and do constant propagation as it sees fit, but semantically the returned value is not comptime known if it depends on a runtime parameter. |
Here's another example from some code I wrote recently: fn doWholeArray(comptime mode: Mode, block_len: u32, array_len: u32) void {
// junk_len is either comptime or runtime known based on mode
const junk_len = if (mode == .junk_buffer) block_len else 0;
var block_start = junk_len;
while (block_start + block_len < array_len) : (block_start += block_len) {
doBlock(mode, block_len, block_start);
}
}
fn doBlock(comptime mode: Mode, block_len: u32, base_index: u32) void {
const junk_len = if (mode == .junk_buffer) block_len else 0;
// ... do more stuff that's not important.
// we don't use mode or block_len here.
if (junk_len != 0) {
swapJunk(base_index, junk_len);
}
} Here, there is currently no way to pass the fn doWholeArray(comptime mode: Mode, block_len: u32, array_len: u32) void {
// junk_len is either comptime or runtime known based on mode
const junk_len = if (mode == .junk_buffer) block_len else 0;
var block_start = junk_len;
while (block_start < array_len) : (block_start += block_len) {
doBlock(junk_len, block_start);
}
}
fn doBlock(inline junk_len: u32, base_index: u32) void {
// ... do more stuff that's not important
if (junk_len != 0) {
swapJunk(base_index, junk_len);
}
} |
Sorry for dragging out this discussion, but is it really more efficient or more "semantic" to require functions to be instantiated with specific comptime arguments before trying to compute their comptime result? In terms of implementation, the calls with specific comptime parameters need to be analyzed lazily in both cases, at which point it shouldn't make any difference whether the function is "instantiated" with the given arguments or merely "computed". Below is an example to illustrate the issue. Status quo: fn twice(x: usize) usize { return x * 2; }
inline fn thrice(x: usize) usize { return x * 3; }
fn foo() {
buffer1: [twice(100)]i32 = undefined; // error! twice() is not formally comptime
buffer2: [thrice(100)]i32 = undefined; // Works, but what if we don't want to make thrice inline?
...
} New behavior: fn twice(inline x: usize) usize { return x * 2; } // inline argument version
fn thrice(x: usize) inline usize { return x * 3; } // inline return version
fn foo() {
buffer1: [twice(100)]i32 = undefined;
// compiler actions:
// 1. twice() has an inline argument and 100 is comptime
// 2. Does twice_100() already exist?
// 3. No, generate twice_100() = fn() usize { const x = 100; return x * 2; }
// 4. Analyze twice_100(); Note the fact that the result is always 200.
// 5. Since twice_100() is comptime, rewrite statement as `buffer1: [200]i32 = undefined;`
buffer2: [twice(100)]i32 = undefined;
// Same as above, except that steps 3 and 4 can be skipped
buffer3: [twice(200)]i32 = undefined;
// Do steps 1 to 5 and instantiate twice_200() in the process.
// ----------
buffer4: [thrice(100)]i32 = undefined;
// compiler actions:
// 1. Is the array length expression comptime?
// 2. Well, 100 is comptime and thrice() has an inline return value. Try to compute that.
// 3. thrice(100) == 300. Memoize this result for future reference. Or not.
// 4. Rewrite expression as `buffer4: [300]i32 = undefined`
buffer5: [thrice(100)]i32 = undefined;
// Retrieve memoized result for thrice(100) or recompute if none exists.
buffer6: [thrice(200)]i32 = undefined;
// Recompute in any case
...
} |
Why should this not be what happens by default?
I think we should discourage this. |
Concerning the doWholeArray/doBlock example, the intent seems to be that doBlock should exist in two versions: one that performs the swap at the end and one that doesn't. The former needs an extra length parameter to do that, while the latter doesn't. The improved implementation of doBlock takes an inline length parameter with the understanding that it will be a comptime zero if the swap does not need to be made and a runtime length otherwise. Personally, I think this is a solution that is too subtle and brittle. Unlike the original implementation with the comptime mode parameter, it is not apparent that doBlock is intended to exist in just two different versions. In fact, if a comptime nonzero parameter is passed to it by accident, the compiler will happily instantiate a third version of doBlock. And a forth. And so on. For cases like this, it would be safer and more expressive to have manual argument inlining, akin to Jai's fn doBlockSwap(junk_len: u32, base_index: u32) void {
// ... do more stuff that's not important
if (junk_len != 0) {
swapJunk(base_index, junk_len);
}
}
fn doBlock(base_index: u32) {
doBlockSwap(inline 0, base_index);
// or: inline doBlockSwap(0, base_index);
} A bit longer, but clear as day and without any unnecessary instantiations. (Edit: you would also have to insert a omptime if into doWholeArray to dispatch to doBlock or doBlockSwap as appropriate. This is the price for eliminating the length parameter from doBlock. May or may not be worth it.) So color me unconvinced. |
Er, I hope you are not suggesting that that for all ordinary functions |
yes actually. To reduce the work done, I've proposed various forms of de-duping based on the characteristics of the argument used. e.g. if you only use |
That's an interesting idea but I think it needs to be backed up by some proof that the exponential explosion caused by the cartesian product of parameter values is controlled by these de-duping traits. It's not obvious to me that this is the case, and if it's not then Zig will lose the ability to produce small code. This would also completely break things like |
That sounds like on of those supercompiler projects. To my knowledge, such attempts have never really succeeded even in pure functional languages. Uncontrollable compile times and code bloat being the main problems. |
Maybe I just don't know enough about Zig compiler internals. The separate instantiation thing is just a hack to be able to forward the computation from the comptime stage to the semantic analysis stage, isn't it? I sort of assumed that the two can be interleaved. |
This would make Zig come a bit closer to the expressive capabilities in C++. Since |
This proposal does not make sense when you read the code and requires a lot of mental power to understand what is happening. Maybe its a cool compiler feature and maybe it has a niche use-case, but the proposed syntax will confuse 99% of people when they see it for the first time, especially since I think zig has a lot of ex-C developers. Maybe another syntax/keyword can be used. Update: I understand that seeing something for the first time is a weak argument. I guess the main problem is that it re-uses industry-wide established "inline" terminology in a bit different/new way. |
This proposal seems pretty cool. I am very, very new to zig but from a decently outside perspective (with a background in other languages that use the inline keyword), I agree with previous comments that the word inline is really confusing (especially to a new user such as myself). I think the use of the suffix "-time" is pretty well understood so it would be an improvement over inline. Something like "trytime" (try comptime and if not default to runtime) or "autotime" (automatically decide on which to pick dependent on situation) is more understandable. There are probably better "-time" suffix words but hopefully my point comes across. This also leave the inline keyword open for future use in actual inline based functionality. |
I'm happy to see this proposal here, since I had a similar idea but don't think I could have communicated the value as clearly. The syntax I imagined was pub fn isDarwin(?comptime tag: Tag) bool {
// ...
}
|
|
This issue came from a discussion of #425 at a recent design meeting. This proposal is accepted as an experiment. We think it's probably a good idea, but think that the issues with it won't appear until after it's really in use. So we're planning to add it and see how it affects the balance of the language, and then reevaluate.
Motivation
In stage 2, inline functions are planned to have different semantics from standard functions. If arguments are comptime known at the call site of an inline function, those parameters will be comptime known within that instantiation of the function. Similarly, if the returned value is comptime known, that is known at the call site as well.
This means that you can write inline functions which accept either comptime known or runtime values as input, and produce a value which may be comptime known as a result. A good example of where this is useful is
std.builtin.Target.isDarwin(..)
. Not all targets are comptime known, but if a target is comptime known, the result of calling isDarwin() should also be comptime known.The downside of
inline
for this is that it also forces the machine code to be inlined into every call site, bloating the generated machine code. This is undesirable for large functions.Proposal
Inline Parameters
Allow the
inline
keyword on individual parameters of a non-inline function. If a parameter is inline, then multiple instantiations of the function may be generated - one for each unique comptime value that is passed to it, and one extra one for "any runtime value".Functions accepting inline parameters are not necessarily considered to be generic. If you take a runtime function pointer to such a function, it will use the version for which all inline parameters are runtime values.
Inline Return Values
Similarly, we could consider having inline return values, where the function's return value is comptime known at the call site if the return expression in the function is comptime known. For the experiment, we've decided that inline returns should not require a keyword. All direct function calls will have inlined returns if possible.
A downside of inline returns is that the called function must be semantically analyzed before the caller can be semantically analyzed, in order to know if the return value is comptime known. This is especially difficult in the case of recursion. For the purpose of the experiment, we will allow the compiler to break these loops by deciding that the return type is runtime known, but if we decide to keep this behavior we will need to develop more concrete rules for how this works.
Calls through function pointers will not be able to inline their return values. To prevent having to generate multiple versions of a function, with or without return code, we can just generate one function which has return code, and have call sites ignore the return value if it is comptime known. In optimized builds, we could do more aggressive analysis to determine if a function needs return code, or consider emitting a separate version of the function for direct calls.
Another implication is that this allows functions which return a comptime-known value but have runtime side effects. This is kind of weird, but it already exists with inline functions.
The text was updated successfully, but these errors were encountered: