Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: change int to be arbitrary precision #19623

Open
robpike opened this issue Mar 20, 2017 · 226 comments
Open

proposal: spec: change int to be arbitrary precision #19623

robpike opened this issue Mar 20, 2017 · 226 comments
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal Proposal-FinalCommentPeriod
Milestone

Comments

@robpike
Copy link
Contributor

robpike commented Mar 20, 2017

An idea that has been kicking around for years, but never written down:

The current definition of int (and correspondingly uint) is that it is either 32 or 64 bits. This causes a variety of problems that are small but annoying and add up:

  • overflow when constants like math.MaxUint64 is automatically promoted to int type
  • maximum size of a byte slice is only half the address space on a 32-bit machine
  • int values can overflow silently, yet no one depends on this working. (Those who want overflow use sized types.)
  • great care must be taken with conversion between potentially large values, as information can be lost silently
  • many more

I propose that for Go 2 we make a profound change to the language and have int and uint be arbitrary precision. It can be done efficiently - many other languages have done so - and with the new compiler it should be possible to avoid the overhead completely in many cases. (The usual solution is to represent an integer as a word with one bit reserved; for instance if clear, the word points to a big.Int or equivalent, while if set the bit is just cleared or shifted out.)

The advantages are many:

  • int (and uint, but I'll stop mentioning it now) become very powerful types
  • overflow becomes impossible, simplifying and securing many calculations
  • the default type for len etc. can now capture any size without overflow
  • we could permit any integer type to be converted to int without ceremony, simplifying some arithmetical calculations

Most important, I think it makes Go a lot more interesting. No language in its domain has this feature, and the advantages of security and simplicity it would bring are significant.

@robpike robpike added the v2 An incompatible library change label Mar 20, 2017
@bradfitz bradfitz added the LanguageChange Suggested changes to the Go language label Mar 20, 2017
@griesemer
Copy link
Contributor

I'm a big fan of this, myself. It would elevate int (and uint) to "unrestricted" (for lack of a better word) types, and only the types with explicit sizes (int16, int32, etc.) would be subject to wrap around.

In many cases (loop iteration variables) the compiler may be able to prove that an int is in a certain range and could produce optimal code. In many other cases, the use of int is not speed-critical.

@randall77
Copy link
Contributor

Let's put this and #19624 in the Thunderdome! Two proposals enter, one proposal leaves...

@griesemer
Copy link
Contributor

A minor related point about this: The int and uint types were intended to be of the "natural" size for a given platform, typically the register size. If we have true integers we loose that naturally sized type - yet we make use of it in a few places where we want integer algorithms to work well for a given platform (strconv.Itoa comes to mind). We may want to consider introducing an unsigned "word" type instead (in the past we have also used uintptr in that role, but that may not necessarily guaranteed to be of register size on all platforms).

@randall77
Copy link
Contributor

randall77 commented Mar 20, 2017

Representing an int in a single machine word will be tricky. We run across the same problem we had with scalars being in direct interfaces - the GC sees a word which is sometimes a pointer and sometimes isn't.
The only easy solution I see is to reserve some top bit(s) to set for raw integers and disallow those top bits for any heap pointer.

@bcmills
Copy link
Contributor

bcmills commented Mar 20, 2017

@randall77

Two proposals enter, one proposal leaves...

Actually, I think they're completely compatible. I recommend both!

@bcmills
Copy link
Contributor

bcmills commented Mar 20, 2017

@randall77

the GC sees a word which is sometimes a pointer and sometimes isn't

Could we fix that with better stack or heap maps? Instead of each word being "either a pointer or a non-pointer", it would be "a pointer, a non-pointer, or an int". I suppose that would require two bits instead of one per address, though ― might bloat the maps a bit.

FWIW, the ML family of languages worked around that issue by making the native types int31 and int63 instead of int32 and int64. I do not recommend that approach.

@randall77
Copy link
Contributor

@bcmills Yes, two bits per word should work. That just buys us more flexibility to put the distinguishing bits somewhere other than the top bits of the word - not sure if that is worth it.

@rasky
Copy link
Member

rasky commented Mar 20, 2017

I love this proposal in abstract, but I'm very concerned about the performance impact. I think it's not by chance that "no language in its domain has this feature". If we use bound-checking elimination has a similar problem, Go compiler isn't very good at it even nowadays, it basically just handles obvious cases, and doesn't even have a proper VRP pass (the one proposed was abandoned because of compile time concerns). Stuff like a simple multiplication would become a call into the runtime in the general case, and I would surprised if the Go compiler could avoid them in most cases, if we exclude obvious cases like clearly bounded for loops.

@griesemer
Copy link
Contributor

@rasky Languages likes Smalltalk and Lisp (and more recently, JavaScript) have pioneered the implementation of such integers - they can be implemented surprisingly efficiently in the common case. A typical implementation reserves the 2 least significant bits as "tags" - making an int on a 64bit platform effectively 62bit in the common (small) case - which is likely more than plenty in most cases (and where it isn't it's either because we need very large integers, or we should be using int64).

One way of using the tag bits is as follows:
00 means small integer (smi)
01 means that the value is a pointer p to an arbitrary precision integer (at p-1)
10 typically the result of an operation (see below)
11 reserved for other uses or unused

Given this encoding, if we have two int values x and y, we can optimistically assume they are smis. For instance x + y would be translated into a single ADD machine instruction, followed by a test of the tag bits. If they are not 00, one (or both) of the operands were large ints and then a runtime call is needed. Also, if there was an overflow, a runtime call is needed. This does add 3 instructions in the fast path, but not more (one conditional jump if overflow, one test of tag bits, one conditional jump if tag bits are not 0 - perhaps more depending on how the runtime call is achieved or if there's a conditional call instruction). It's not cheap, but it's much less expensive than a runtime call in the common case. If both operands were smis, the tag bits remain 00, and the result doesn't need further work. The same is true for subtraction. Multiplication requires a bit more work but is also more expensive. Finally, division is the most complicated one, but also the most expensive operation in general.

It might be worthwhile performing a little experiment where one generates this additional code for each integer addition, using dummy conditional branches that will never be taken (or just jump to the end of the instruction sequence) and see what the performance impact is.

@DmitriyMV
Copy link
Contributor

Not a fan of this proposal - currently its quite simple to argue about resulting code and its performance characteristics when doing simple arithmetics. Also - even if losing two bits on 64 bit platform is not important, on 32 bit one it is.

Maybe we could have an arbitrary precision ints implemented in new built-in type (like we do with complex currently)?

@gopherbot gopherbot added this to the Proposal milestone Mar 20, 2017
@dmitshur
Copy link
Contributor

dmitshur commented Mar 21, 2017

Can you discuss how such int variables would represented in mediums other than RAM, and marshaled/unmarshaled?

Encoding to JSON should be easy and map really well. As far as I know, JSON spec does not place restrictions on size of numbers, so a really large int would encode as as base 10 number (and vice versa for decoding). E.g.:

{"number": 12312323123131231312312312312312321312313123123123123123123}

Would map to an int with value 12312323123131231312312312312312321312313123123123123123123.

What about something like encoding/gob or encoding/binary?

@griesemer
Copy link
Contributor

@shurcooL

  1. printing is obvious (the usual human-readable forms) - applies to JSON
  2. encoding/gob could use a private internal representation
  3. encoding/binary is for fixed-width numbers at the moment, the proposed int's wouldn't be - but a var-int encoding could work (though the current format would probably be inefficient).

Re: 3, note that the compiler's import/export format already encodes arbitrary-sized integer values because of precise constants.

@magical
Copy link
Contributor

magical commented Mar 21, 2017

@shurcooL @griesemer I believe encoding/gob already uses a variable-length encoding for all integer types.

@jasonwatkinspdx
Copy link

jasonwatkinspdx commented Mar 21, 2017

Go should have an arbitrary precision number type that is more convenient than math.Big. That type should not attempt to masquerade as int/uint, as these aren't just used semantically as "number" but more so as "number compatible with c/foolang code that uses the natural local word size".

The root problem here is that golang's design prevents a library from defining an arbitrary precision type that is as semantically and syntactically convenient as a type provided by the runtime/compiler with internal knowledge and cludges. Solve that problem with golang 2.0, or instead we will find ourselves years from now with countless ad hoc accretions like this.

Edit: I'm a fan of this design/feature in scripting languages. I don't see how it works as the base int type in a systems language to replace c/c++/java. I absolutely think we should have a great and convenient arbitrary precision number type, and think the road to that is a golang where library defined types are not second class to ad hoc boundary flaunting extensions of the runtime and compiler provided types.

@bcmills
Copy link
Contributor

bcmills commented Mar 21, 2017

@jasonwatkinspdx

It's true that one of the roles of int in Go 1 is as an analogue to C's ssize_t. But it doesn't actually matter whether int is exactly the size of the address space or merely sufficient to cover the address space.

Perhaps more importantly, C APIs don't actually use ssize_t all that often: they use size_t instead.
You have to range-check conversions from C.size_t to int, because the latter is potentially only half the range of the former. Under this proposal, you'd need the range check in the other direction instead, because C.size_t would now be smaller than int. The only way to avoid the need for a range check entirely is by making the Go "size" type have exactly the same range as C.size_t, and at that point you're looking at a fairly high risk of overflow bugs (especially from decrementing past zero).

@jasonwatkinspdx
Copy link

@bcmills

But it doesn't actually matter whether int is exactly the size of the address space or merely sufficient to cover the address space.

It does matter. People make design decisions concerning the size of the address space and how indexes into it can be represented in serializations or other transformed values. Making the type that spans the address space arb precision offloads many complexities to anyone that wants to ship data to other systems.

What does a c abi compatible library consumer of golang look like in a world where int/uint is arb precision? Is it really better if the golang side is blind to any issue at the type level and the c side has no choice but panic?

I do see the value of these types, I just don't want them conflated with int/uint. I'm entirely in favor of a numeric tower in golang being the default numeric type, I just don't want it to pretend to have the same name as the 32bit/64bit machine/isa determined types.

@bcmills
Copy link
Contributor

bcmills commented Mar 21, 2017

People make design decisions concerning the size of the address space and how indexes into it can be represented in serializations or other transformed values.

Can you give some examples? I'm having a hard time thinking of anything other than a syscall that spans a process boundary but should be sized to the address space, and programs using raw syscalls generally need to validate their arguments anyway.

What does a c abi compatible library consumer of golang look like in a world where int/uint is arb precision?

The same thing it looks like today: using C.size_t instead of int.

@bronze1man
Copy link
Contributor

bronze1man commented Mar 21, 2017

@robpike
It looks harder to reduce CPU of golang program when ints become arbitrary precision. 😝
I do not need arbitrary precision ints by default, but I do need golang program to use less CPU to save my money with gce vm server.

@jasonwatkinspdx
Copy link

jasonwatkinspdx commented Mar 21, 2017

programs using raw syscalls generally need to validate their arguments anyway.

I want to capture that in the types, not in a dynamic value range check. I don't think this is an unreasonable expectation of a language that markets itself as a replacement for c/c++/java for systems programming.

@cznic
Copy link
Contributor

cznic commented Mar 21, 2017

I honestly thought it's 11 days more than it really is.

I don't want to lose normal int performance. I don't believe there's a space/time effective way for an arbitrary precision int to be, in many cases, comparable in performance to the fixed width int.

I have nothing against adding an arbitrary precision mpint type (name does not matter), which the compiler accepts mixed in expressions with normal ints providing the conversions as needed. IOW, it would be really nice to be able to use standard operators with arbitrary precision integers, yes, but please only explicitly. Please leave int alone.

@ainar-g
Copy link
Contributor

ainar-g commented Mar 21, 2017

What about floats? JSON can have values like 12345678901234567890.12345678901234567890, with many digits before and after the dot, but IIRC no Go type can accurately represent those. That is, no Go type can do maths with it, I know about json.Number.

Personally, I would keep int as it is and instead added a number type that could represent any number with or without a fractional part.

@bronze1man
Copy link
Contributor

Please do not make int operation slower by default.
I do not need arbitrary precision ints by default, but I do need golang program to use less CPU to save my money with gce vm server.

I do not need some like mpint with Infix expression eithor, *big.Int is enough for me.

@mvdan
Copy link
Member

mvdan commented Mar 21, 2017

I don't understand the performance concerns - even if the performance hit would be noticeable in some scenarios, could you not use sized types like int32 or the unsigned word (like the current uint) mentioned by @griesemer?

@faiface
Copy link

faiface commented Mar 21, 2017

I am 100% for this proposal. Worrying about the performance impact of arbitrary precision ints is like worrying about the performance impact of array bounds checks, in my opinion. It's negligible if implemented correctly.

I also like this because of how inconvenient the big package is and how I always fall back to Python when I'm dealing with big integers.

@bakul
Copy link

bakul commented Feb 6, 2023

Re math.MaxInt: IMHO breaking compatibility in 2.0 for this would be worth the benefit. But if you must have it, given that an arb. precision ints would require a variable amount of space, I would use some bitpattern to mean +infinity (and similarly -infinity). They could even be the same bitpatterns as fl.pt. +/- infinity but of "type" int. :-)

@doggedOwl
Copy link

Is this proposal still being considered? How does it play with the decision that there will be no Go2?

@josharian
Copy link
Contributor

In order to be backwards compatible, it would require adding a new type (integer?). int cannot and will not change. I don't know whether it is still being seriously considered.

@xiaokentrl

This comment was marked as duplicate.

@xiaokentrl

This comment was marked as duplicate.

@nathany
Copy link
Contributor

nathany commented Dec 4, 2023

“There will not be a Go 2 that breaks Go 1 programs.” — Backward Compatibility

How does this statement impact this proposal?

I assume that it wouldn't be possible to use the go.mod go directive approach - #19623 (comment)

I'm wondering the same in light of Go 1.21's new forward compatibility features.

A few people have suggested a distinct integer type (or bigint or some other name). This would alleviate any concerns over serialization, performance, or breaking existing programs -- but may require explicit conversion to/from int and uint.

For cases where arbitrary precision math is important, what @griesemer outlined with 2 reserved bits should be faster than the current big.Int in many cases. Correct me if I'm wrong?

If that is possible to build as a library, we would be one step closer to making this proposal a reality. The further step of providing a built-in type in the language would make arbitrary precision math more ergonomic for complex calculations. As a thought experiment, let's imagine we do that.

"An untyped constant has a default type which is the type to which the constant is implicitly converted in contexts where a typed value is required" - spec

Then, would it be possible, down the road, to modify the type inference to make bigint/integer the default type instead of int, but without changing the meaning of int and uint?

If that change was restricted to a module based on go.mod (e.g. go 1.24.0), what would that mean for compiling modules together? It seems to me that many function definitions would continue to explicitly use int, would could make adoption of bigint/integer as a default quite challenging.

If this is feasible, the transition could be done if/when we are satisfied with the performance of the bigint/integer implementation.

If it turned out to work really well, then perhaps we could even deprecate int and uint eventually, arriving with a very similar outcome to the original proposal (but without ever altering their meaning).

I'm sure there are things I haven't thought of, so I look forward to your thoughts.

@Merovius
Copy link
Contributor

Merovius commented Dec 4, 2023

@nathany

How does this statement impact this proposal?

It means it is extremely unlikely to happen. The value of the proposal only really becomes realized if we'd change how int works. But we can't do that, as it would be a redefinition, which we don't want to do.

If that is possible to build as a library, we would be one step closer to making this proposal a reality.

I don't think it is practical to build that as a library. It requires interaction with the garbage collector, which has to know which memory contains pointers and which doesn't. So, at the very least, the compiler has to "magically" recognize if you use that type and insert instrumentation for the GC. Given that you could then also do type MyBigInt bigint.Int, I'm not sure how that would work in practice.

Then, would it be possible, down the road, to modify the type inference to make bigint/integer the default type instead of int, but without changing the meaning of int and uint?

I don't think so. For example

func main() {
    for v := (1<<63)-1; v > 0; v++ {
    }
}

Currently (when int has 64 bits) this terminates, but if v was bigint, it would loop forever. Thus that change would be a redefinition, which is disallowed even with a go.mod guard.

If it turned out to work really well, then perhaps we could even deprecate int and uint eventually, arriving with a very similar outcome to the original proposal (but without ever altering their meaning).

I agree with @robpike that a lot (probably most) of the value comes from redefining what len returns as well. And what io.Reader.Read etc. return. I don't think just changing the default type of untyped integer literals really would be that similar to the original proposal in outcome.

@nathany
Copy link
Contributor

nathany commented Dec 4, 2023

Thanks @Merovius.

I think there may still be some value in arbitrary precision integers being first class as a separate type (e.g. bigint) with all the usual operators. Even if only for the ergonomics. But perhaps that is a different proposal.

If this proposal isn't feasible, here's hoping for checked integers. #30613

@ianlancetaylor ianlancetaylor added LanguageChangeReview Discussed by language change review committee and removed v2 An incompatible library change NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 6, 2024
@josharian
Copy link
Contributor

josharian commented Aug 7, 2024

We can't change int to be arbitrary precision because backwards compatibility. But we could add a new type, which (imho) would be really useful.

Finding a name is hard, though. It should be short, memorable, etc. integer is too long. But but but but but. There's an obvious name for an Arbitrary (precision) Integer.

I give you: type ai.

@Bjohnson131
Copy link

FWIW, I believe this has been mentioned before, but adding u128/i128 and u256/i256, and maybe u512/i512 would satisfy 99% of what folks really want here.

@lpar
Copy link

lpar commented Aug 7, 2024

Borrow from the Pascal/Modula world, and call it cardinal. Abbreviate to car. The only downside would be getting Lisp programmers' hopes up.

@josharian
Copy link
Contributor

josharian commented Aug 7, 2024

We could borrow from math/German and call it Zahlen, or Z for short. :P

( is too hard to type.)

@adonovan
Copy link
Member

adonovan commented Nov 6, 2024

This is a really interesting idea, but the change as proposed is very much an incompatible one: today, users can assume that x + y has finite fields semantics, with an implicit modulo operation, but if this proposal were implemented, any code that depends on that assumption would no longer work correctly.

More practical objections include:

  • cheap operations such as x+1 (a single CPU cycle today) become at the very least a (predictable) conditional branch, followed in the worst case by an allocation.
  • the ABI would be incompatibly changed and all assembly code would need to be audited and perhaps rewritten.
  • runtime contexts where allocation is prohibited would not be allowed to use int arithmetic.

Adding a new bigint built-in type would be a much more tractable problem, albeit less exciting.

Therefore this is a likely decline. Leaving open for four weeks for final comments.

@adonovan, for the language change review committee.

@griesemer
Copy link
Contributor

I don't fully agree with this assessment:

  • Any code relying on finite field semantics of int is likely incorrect because the size of an int is not fixed (it can be 32bit or 64bit). Code that relies on finite field semantics must use the sized types (int32, int64, etc.) for correct behavior across all platforms.
  • Cheap operations such as x+1 would become slightly more expensive. The only place where that really matters is in loops. For one, with Go's expanded range construct, typical for i := 0; i < n; i++ loops are not really needed anymore. Also, in many cases (and especially the before-mentioned case) the compiler can determine the bounds of such iteration variables and thus prove that they never exceed the efficient int range.
  • Assembly code operating with int/uint at the limits of the efficient int range should probably use sized ints. I note that the math/big assembly code works with sized integer types.
  • Likewise, runtime contexts where allocation is prohibited could use sized ints if the compiler cannot prove the limits of int ranges. I bet there's very few such places.

In short, I believe this change is possible; the primary unknown is the performance impact. That impact may be small with sufficient engineering effort.

The latter is perhaps the crux: we don't have the bandwidth/capacity for this engineering effort, so this may be our limiting factor.

@adonovan
Copy link
Member

Any code relying on finite field semantics of int is likely incorrect because the size of an int is not fixed (it can be 32bit or 64bit). Code that relies on finite field semantics must use the sized types (int32, int64, etc.) for correct behavior across all platforms.

The size of int may vary across architectures, but it's easy to know which size it is, and existing code may assume that the value is representable in a field of that size.

Cheap operations such as x+1 would become slightly more expensive. The only place where that really matters is in loops. For one, with Go's expanded range construct, typical for i := 0; i < n; i++ loops are not really needed anymore.

Any function may be called from inside the loop of another function (and perhaps inlined into it), so I don't really understand this argument. My point was that any arithmetic expression formerly as cheap as a single cycle now costs a conditional branch. The choice of looping construct is not important.

Assembly code operating with int/uint at the limits of the efficient int range should probably use sized ints.

I agree; still, it would require an audit and a clean-up of existing correct code.

Likewise, runtime contexts where allocation is prohibited could use sized ints if the compiler cannot prove the limits of int ranges.

Agreed; let's ignore this objection since the runtime is entirely within our control.

@josharian
Copy link
Contributor

josharian commented Nov 11, 2024

Had we but world enough and time, we could check some of these things empirically. Knowing the shape and frequency of the behavior and performance changes might provide clarity, even with a naive, relatively unoptimized implementation. We even have the GOEXPERIMENT mechanism to enable it. (But at my back I always hear…)

@Merovius
Copy link
Contributor

The size of int may vary across architectures, but it's easy to know which size it is, and existing code may assume that the value is representable in a field of that size.

In particular, you can currently assume that int->int64 conversions don't lose precision. I use that all the time when encoding/decoding integer values.

@lpar
Copy link

lpar commented Nov 12, 2024

As far as loop efficiency, independent of this proposal it would be good if you could declare the type of a loop index, like you can in other languages. e.g. in Java you can for (byte i=1; i<11; i++). That would allow efficient looping even if int was potentially expensive.

@bitspill
Copy link

it would be good if you could declare the type of a loop index, like you can in other languages

You can achieve that desire by adding a type cast

package main

import "fmt"

func main() {
	var sum uint8 = 0
	for i := uint8(0); i < 10; i++ {
		sum += i
	}
	fmt.Println(sum)
}

@lpar
Copy link

lpar commented Nov 12, 2024

Thanks, I don't think I've ever seen that done, though it's obvious in retrospect. (It would probably be worth mentioning in A Tour of Go or Effective Go.)

So that suggests that the possible loop performance issue doesn't have to be a deal-killer for this proposal.

@Bjohnson131
Copy link

Question- is this proposal for int to be as many bits as needed at run-time, or is it to add arbitrary length int types for any 2^n bit length, ie int128, int512, int1024, etc? I like one, and am not sure about the other.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Nov 12, 2024

This proposal is for the type int to represent any size of integer.

int128 is #9455.

@ianlancetaylor
Copy link
Contributor

Although there are reasons for making this change, it could break existing Go packages. Any package that serializes an integer value, that takes an int and then converts to int64, would work reliably today but would fail silently for large values if we adopted this proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee Proposal Proposal-FinalCommentPeriod
Projects
None yet
Development

No branches or pull requests