Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: int semantics #1

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

RFC: int semantics #1

wants to merge 4 commits into from

Conversation

cometkim
Copy link
Member

@cometkim cometkim commented Nov 18, 2024

@cometkim cometkim self-assigned this Nov 18, 2024
@cometkim
Copy link
Member Author

This is still TBD, but the suggested actions to the compiler are:

  1. Fix the assumption broken by FFIs. We need to explicitly do fromNumber when referencing external sources, like this:
    @val external extNumber: int = "extNumber"
    
    let fromExt = extNumber // here
  2. Change the implementation of multiply from Math.imul to fromNumber
  3. Add Overflow_value built-in exception to resolve ambiguity of abs(min_value)

Any early feedback and additional suggestions are welcome :)

@cristianoc
Copy link

Have you measured the perf implications of changing multiplication from the current one?
Also x+y needs similar changes if it wants to satisfy the definition.

@shulhi
Copy link
Member

shulhi commented Nov 19, 2024

I don't think it's a good idea to implicitly convert infinity to min/max value of int. Infinity has a different semantic compared to the min/max bound of int.

@cometkim
Copy link
Member Author

Infinity has a different semantic compared to the min/max bound of int.

So does 0, and more problematic. The original resolution I mentioned in the issue was about stdlib, so there will be more suggestion about it into the "API Consideration" section. Something like, "consider using option on the conversion API like Float.toInt"

@cometkim
Copy link
Member Author

cometkim commented Nov 19, 2024

Have you measured the perf implications of changing multiplication from the current one?

It depends on the environment.

In my measurements, in Node.js, inlined mul (multiply & truncate) is slightly faster than Math.imul. Both are so fast that it doesn't seem to mean much, but on an amd64 Linux machine, inlined mul is 8% faster.

I ran it on quickjs to see the impact of the optimization. The results are 30% faster for inlined mul. This implies that optimizations by simple inlining are more achievable than by JIT.

On the other hand, in Bun, Math.imul and mul are identical, but when I run on a non-inlined function, it's 45% faster than others.

suite
import { summary, run, bench } from './node_modules/mitata/src/main.mjs';

const min_value = -(2 ** 31);
const max_value = 2 ** 31 - 1;

function* randomIntegers(n) {
  for (let i = 0; i < n; i++) {
    yield Math.floor(Math.random() * 2 ** 32) + min_value;
  }
}

const length = 1000;
const x = [...randomIntegers(length)];
const y = [...randomIntegers(length)];

summary(() => {
  bench('Math.imul', () => {
    for (let i = 0; i < length; i++) {
      void Math.imul(x[i], y[i]);
    }
  });

  // function mul(x, y) {
  //   return x * y | 0;
  // }
  // bench('mul', () => {
  //   for (let i = 0; i < length; i++) {
  //     mul(x[i], y[i]);
  //   }
  // });

  bench('mul  (inline)', () => {
    for (let i = 0; i < length; i++) {
      void (x[i] * y[i] | 0);
    }
  });
});

await run();

@cristianoc
Copy link

Have you measured the perf implications of changing multiplication from the current one?

It depends on the environment.

In my measurements, in Node.js, inlined mul (multiply & truncate) is slightly faster than Math.imul. Both are so fast that it doesn't seem to mean much, but on an amd64 Linux machine, inlined mul is 8% faster.

I ran it on quickjs to see the impact of the optimization. The results are 30% faster for inlined mul. This implies that optimizations by simple inlining are more achievable than by JIT.

On the other hand, in Bun, Math.imul and mul are identical, but when I run on a non-inlined function, it's 45% faster than others.

suite

This is great though I now don't understand what "Change the implementation of multiply from Math.imul to fromNumber" means. Concretely the compilation of (x,y) => x*y should change from what generated code to what generated code?

@cometkim
Copy link
Member Author

Have you measured the perf implications of changing multiplication from the current one?

It depends on the environment.
In my measurements, in Node.js, inlined mul (multiply & truncate) is slightly faster than Math.imul. Both are so fast that it doesn't seem to mean much, but on an amd64 Linux machine, inlined mul is 8% faster.
I ran it on quickjs to see the impact of the optimization. The results are 30% faster for inlined mul. This implies that optimizations by simple inlining are more achievable than by JIT.
On the other hand, in Bun, Math.imul and mul are identical, but when I run on a non-inlined function, it's 45% faster than others.
suite

This is great though I now don't understand what "Change the implementation of multiply from Math.imul to fromNumber" means. Concretely the compilation of (x,y) => x*y should change from what generated code to what generated code?

let f = (a, b) => a * b

currently emit this code

function f(a, b) {
  return Math.imul(a, b);
}

I'm requesting to change it to this code

function f(a, b) {
  return a * b | 0;
}

just like we do for add

@cometkim
Copy link
Member Author

cometkim commented Nov 20, 2024

Since both a and b is type-checked, I expect fromNumber(x) outputs x | 0.

Runtime dispatch is required when it is outside the compiler's capabilities.

@cristianoc
Copy link

OK I think the entire RFC can be written in a few lines if one defines operations using | 0 and only comments and explains on the deviations from this.
For FFI, +-Infinity is treated specially, for division/mod one has div0 check, for abs there's an exception.
Then one can discuss the reason for these special treatments and comment on differences w.r.t. how things are done in master.

@cristianoc
Copy link

cristianoc commented Nov 20, 2024

For example, in the case of float to int conversion, one can have the baseline (num) => num | 0 and only discuss the deviations from this baseline:
1 NaN: currently no deviation suggested. That would mean NaN goes to 0. I doubt that's the intention and it should to go None instead. (Or an exception, but let's ignore the distinction here).
2 Infinity: current suggestion is to go to max/min int32. That is one possibility. The other possibility is to use None. What comes to mind is +Infinity + 1, whether it should go to min_int (because max_int+1) or to None/exception.

@cometkim
Copy link
Member Author

Oh sorry, I had to focus on explaining the current behavior as it is before making changes 😞

@cometkim
Copy link
Member Author

1 NaN: currently no deviation suggested. That would mean NaN goes to 0. I doubt that's the intention and it should to go None instead. (Or an exception, but let's ignore the distinction here).

IMO we can leave it as the responsibility of stdlib.

2 Infinity: current suggestion is to go to max/min int32. That is one possibility. The other possibility is to use None. What comes to mind is +Infinity + 1, whether it should go to min_int (because max_int+1) or to None/exception.

As @glennsl originally pointed out, Infinity and NaN are not included in int semantics, which would invalidate reasons for having int type. So does None

My intention in the fromNumber definition was not to try representing the Infinity value, but to select a more useful "garbage value" in the integer range. If that doesn't work, then we should return to 0.

@cometkim
Copy link
Member Author

I added the Overflow_value to the abs definition because I assumed there would be no compatibility issues since it is a primitive that does not currently exist. If it doesn't work universally, it can also be left out.

@cristianoc
Copy link

1 NaN: currently no deviation suggested. That would mean NaN goes to 0. I doubt that's the intention and it should to go None instead. (Or an exception, but let's ignore the distinction here).

IMO we can leave it as the responsibility of stdlib.

2 Infinity: current suggestion is to go to max/min int32. That is one possibility. The other possibility is to use None. What comes to mind is +Infinity + 1, whether it should go to min_int (because max_int+1) or to None/exception.

As @glennsl originally pointed out, Infinity and NaN are not included in int semantics, which would invalidate reasons for having int type. So does None

My intention in the fromNumber definition was not to try representing the Infinity value, but to select a more useful "garbage value" in the integer range. If that doesn't work, then we should return to 0.

Actually I was referring to the stdlib function to convert float to int.
And asking the question of: on what values it should return a result (and what result), otherwise return None or an exception.

Then similar questions for all the operators.

@cometkim
Copy link
Member Author

Actually I was referring to the stdlib function to convert float to int.
And asking the question of: on what values it should return a result (and what result), otherwise return None or an exception.

Then I think stdlib should always use option for any kind of float to int conversions, but it seems not possible without making breaking changes to existing interfaces of Core/Belt/Js, or we have to add another migration requirement since v12. I'm curious to hear @zth's thoughts on this.

@zth
Copy link

zth commented Nov 21, 2024

The way we've dealt with these things before is to have 2 or more variants in the stdlib. In this case, we could for example have toIntOpt return an option if it's NaN, and toInt raise if it's NaN.

@cristianoc
Copy link

The way we've dealt with these things before is to have 2 or more variants in the stdlib. In this case, we could for example have toIntOpt return an option if it's NaN, and toInt raise if it's NaN.

That makes sense.
@cometkim what do you think is better for +/- infinity: return None/exception or max/min int?

@cometkim
Copy link
Member Author

Using result for detail errors for each case is too much?

@cometkim
Copy link
Member Author

For example, we can have Safe version APIs as a module

type errorKind = [#Overflow_value]

exception ConversionError(errorKind)

let toInt: float => int // raise ConversionError

module Safe = {
  let toInt: float => result<int, errorKind>
}

We could also create both Safe and Unsafe modules and provide a flag to open one by default.

@cometkim
Copy link
Member Author

what do you think is better for +/- infinity: return None/exception or max/min int?

If there is an alternative API, then assuming conversion Infinity to int is done at least via the API, and users do not use primitives directly unless they are intended.

I prefer using result type in this case too. If you think that is overkill, None is fine. That is also the choice of many existing libraries like tablecloth.

@cristianoc
Copy link

what do you think is better for +/- infinity: return None/exception or max/min int?

If there is an alternative API, then assuming conversion Infinity to int is done at least via the API, and users do not use primitives directly unless they are intended.

I prefer using result type in this case too. If you think that is overkill, None is fine. That is also the choice of many existing libraries like tablecloth.

I guess in this case, it f there are 2-3 kinds of errors, result makes sense to document what they are.
I would consider as payload a variant, whose cases have identical names to the corresponding exceptions that are raised by the other conversion function that raises instead of returning Error.

@cristianoc
Copy link

what do you think is better for +/- infinity: return None/exception or max/min int?

If there is an alternative API, then assuming conversion Infinity to int is done at least via the API, and users do not use primitives directly unless they are intended.

I prefer using result type in this case too. If you think that is overkill, None is fine. That is also the choice of many existing libraries like tablecloth.

I guess in this case, it f there are 2-3 kinds of errors, result makes sense to document what they are.
I would consider as payload a variant, whose cases have identical names to the corresponding exceptions that are raised by the other conversion function that raises instead of returning Error.

So if one uses the version returning result one would do eg

| Error(NaN) => ...

and when converting to using the function that raises:

| exception NaN => ...

@cometkim
Copy link
Member Author

what do you think is better for +/- infinity: return None/exception or max/min int?

If there is an alternative API, then assuming conversion Infinity to int is done at least via the API, and users do not use primitives directly unless they are intended.
I prefer using result type in this case too. If you think that is overkill, None is fine. That is also the choice of many existing libraries like tablecloth.

I guess in this case, it f there are 2-3 kinds of errors, result makes sense to document what they are. I would consider as payload a variant, whose cases have identical names to the corresponding exceptions that are raised by the other conversion function that raises instead of returning Error.

One minor problem is that if we introduce this API style for float to int conversion, it will be expected for other types of conversion like string to int.

Then we'll need more error definitions per conversion. We need to propose a module convention to split these errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

4 participants