Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify String and Array maximum lengths #641

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

msaboff
Copy link
Contributor

@msaboff msaboff commented Jul 21, 2016

Currently Strings are defined to consist of up to 253 - 1 elements (See 6.1.4 The String Type). Arrays on the other hand can have up to 232 - 1 indexed elements (See 6.1.7 The Object Type and the definition of array index). Given that there are functions that convert readily between Arrays and Strings, it makes sense that their limits be the same. Another way of looking at this is that a String is an array of elements.

@ljharb
Copy link
Member

ljharb commented Jul 21, 2016

Firefox appears to have a max length of 2 ** 28 - 1, Safari 2**31 - 1, Chrome around 2**27 - not sure about Edge or IE but this doesn't seem like it will conflict with any implementations.

@rwaldron
Copy link
Contributor

This should be an agenda item, as I'm sure implementers will want to discuss

@rwaldron
Copy link
Contributor

rwaldron commented Jul 21, 2016

@msaboff
Copy link
Contributor Author

msaboff commented Jul 21, 2016

Just added it to the agenda.

@allenwb
Copy link
Member

allenwb commented Jul 22, 2016

During development of ES6 a decision was made to extend the specified max length of Strings and Typed Arrays 253-1. The rationale was the 4 gig strings and arrays were likely to become too limiting in a world where address spaces could grow to be thousands of gigabytes. We would have also extended the max length of Array except for the strange legacy 232-1 wrapping behavior of Array property indexing. The fear was that extending the max size of Array (and removing the wrapping) might break some existing code. However, we did decide that most of the Array.prototype methods that are written to operate on generic integer indexed collections (not just Array instances) could safely be rewritten to not have dependencies upon the 232-1 limit or index wrapping. All of this and the trade-offs involved was extensively discussed at TC39 meetings and should be in meeting notes (most likely 2011-2014 timeframe).

We really shouldn't want to revert the max size of strings or Typed Arrays/ArrayBuffer. Someday that is going to be a problem and the best way to fix it is to allow for it now, before implementations are running into it as an issue.

The real challenge is how to make it possible to extend Array beyond that limit. From that perspective I think we came to the wrong conclusion at the July 2015 meeting WRT split. Remember the only reason that the max length of Array wasn't increased was because of web breakage concerns. But note that part of the rationale TC39 used to justify removing the 232-1 limits within most Array prototype functions was that as of that time, no implementation support Arrays (or strings) of that size, so there could be no legacy usage of those functions that depended upon that size limit.

Generally, there is no need to legacy protect code that didn't exist prior to the spec. edition that first makes the change. In the case, of split, there could be no valid legacy code that is dependent upon splitting a string whose length is > 232-1. So how should split be specified to behave in a future where some implementations allow such a limit? The simple and immediate fix (rather than reverting to using ToUint32 on the length) would be to simply throw if split needs to create an array whose length is greater than 232-1. In future code, if it isn't possible to create an Array instance of the necessary size, throwing is the appropriate behavior rather than creating a too short Array that doesn't match the expected results of split.

But I think, there is possibly a better solution. Remember, that the only reason we didn't extend the length limit for Array was the fear that removing the wrapping behavior of array indexing would break some legacy code. Maybe there is a way around that legacy trap. What if we allow for two variants of Array instances. Instances that have the 232-1 length limit and which warp indexing and instances that have a 253-1 length limit and don't warp indexing. (Note that both would be considered instances of the Array constructor, share the same prototype etc. The different semantics would be imposed at the MOP level rather than at the class level).

Let's distinguish the two kinds of Array instances, let's call them "legacy arrays" and "huge arrays". To maintain legacy compatibility with new Array(aValueGreaterThan2raisedTo32), new Array would have to continue to create legacy array instances. There would have to be some new mechanism for creating huge arrays instance. As a strawman, let's assume that could be done by something like:

   new Array.huge(len)  //len may be aValueGreaterThan2raisedTo32

It should also be possible to say things like:

Array.huge.from(anotherPossibleHugeCollection)

New ES code that wants to accommodate very large arrays (or use smaller arrays the don't wrap large indices) would use Array.huge based construction. split and any other built-ins that need to create possibly huge arrays could then be specified to do the equivalent of new Array.huge when they encounter a length value > 232-1.

@concavelenz
Copy link

Is there actually any data on code intentionally relying on wrapping of
indexes? Or is this just a theoretical concern?

On Jul 22, 2016 12:45 PM, "Allen Wirfs-Brock" notifications@github.com
wrote:

During development of ES6 a decision was made to extend the specified max
length of Strings and Typed Arrays 253-1. The rationale was the 4 gig
strings and arrays were likely to become too limiting in a world where
address spaces could grow to be thousands of gigabytes. We would have also
extended the max length of Array except for the strange legacy 232-1
wrapping behavior of Array property indexing. The fear was that extending
the max size of Array (and removing the wrapping) might break some existing
code. However, we did decide that most of the Array.prototype methods that
are written to operate on generic integer indexed collections (not just
Array instances) could safely be rewritten to not have dependencies upon
the 232-1 limit or index wrapping. All of this and the trade-offs
involved was extensively discussed at TC39 meetings and should be in
meeting notes (most likely 2011-2014 timeframe).

We really shouldn't want to revert the max size of strings or Typed
Arrays/ArrayBuffer. Someday that is going to be a problem and the best way
to fix it is to allow for it now, before implementations are running into
it as an issue.

The real challenge is how to make it possible to extend Array beyond that
limit. From that perspective I think we came to the wrong conclusion at the July
2015 meeting WRT split
https://github.com/rwaldron/tc39-notes/blob/master/es7/2015-07/july-28.md#conclusionresolution-11.
Remember the only reason that the max length of Array wasn't increased was
because of web breakage concerns. But note that part of the rationale TC39
used to justify removing the 232-1 limits within most Array prototype
functions was that as of that time, no implementation support Arrays (or
strings) of that size, so there could be no legacy usage of those functions
that depended upon that size limit.

Generally, there is no need to legacy protect code that didn't exist prior
to the spec. edition that first makes the change. In the case, of split,
there could be no valid legacy code that is dependent upon splitting a
string whose length is > 232-1. So how should split be specified to
behave in a future where some implementations allow such a limit? The
simple and immediate fix (rather than reverting to using ToUint32 on the
length) would be to simply throw if split needs to create an array whose
length is greater than 232-1. In future code, if it isn't possible to
create an Array instance of the necessary size, throwing is the appropriate
behavior rather than creating a too short Array that doesn't match the
expected results of split.

But I think, there is possibly a better solution. Remember, that the only
reason we didn't extend the length limit for Array was the fear that
removing the wrapping behavior of array indexing would break some legacy
code. Maybe there is a way around that legacy trap. What if we allow for
two variants of Array instances. Instances that have the 232-1 length
limit and which warp indexing and instances that have a 253-1 length
limit and don't warp indexing. (Note that both would be considered
instances of the Array constructor, share the same prototype etc. The
different semantics would be imposed at the MOP level rather than at the
class level).

Let's distinguish the two kinds of Array instances, let's call them
"legacy arrays" and "huge arrays". To maintain legacy compatibility with new
Array(aValueGreaterThan2raisedTo32), new Array would have to continue to
create legacy array instances. There would have to be some new mechanism
for creating huge arrays instance. As a strawman, let's assume that could
be done by something like:

new Array.huge(len) //len may be aValueGreaterThan2raisedTo32

It should also be possible to say things like:

Array.huge.from(anotherPossibleHugeCollection)

New ES code that wants to accommodate very large arrays (or use smaller
arrays the don't wrap large indices) would use Array.huge based
construction. split and any other built-ins that need to create possibly
huge arrays could then be specified to do the equivalent of new Array.huge
when they encounter a length value > 232-1.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#641 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABMDKpbH_GiFyS9fOdhV2eME3cFfiV4rks5qYR3BgaJpZM4JSQZF
.

@allenwb
Copy link
Member

allenwb commented Jul 23, 2016

Is there actually any data on code intentionally relying on wrapping

This was discussed at past TC39 meetings, but I don't recall whether any actual evidence was presented. Need to search past TC39 meeting notes.

@claudepache
Copy link
Contributor

claudepache commented Jul 25, 2016

I’ve found the following old bug about removing the uint32 length restriction on arrays. But it was WONTFIXed without other explanation than "we didn't do that":

https://bugs.ecmascript.org/show_bug.cgi?id=145

@claudepache
Copy link
Contributor

I've tested various ways of attempting to construct an array of out-of-bound length. The results are in the table below. The most interesting cases are in the last two columns, namely the behaviour of the .concat() method.

In ECMA262, the RangeError is due to the final step in the respective algorithms, that attempts to set an illegal value to the length of the array. (That final step was accidentally removed in .concat() in ES5, see ecmascript:bug#129, but it was present in ES3 and again in ES6). Without that final step, the length of the array would be stuck to at most 232-1 if I read correctly the spec.

Relevant nontrivial section in the spec: the [[DefineOwnProperty]] internal method of Array exotic objects

(NB: 0xffffffff = 232-1 = 4294967295.)

[].length = -1 [].length = 0x123456789 Array(0xffffffff)
.push(42)
Array(0xffffffff)
.unshift(42)
Array(0xffffffff)
.splice(0xfffffffd, 0, 42)
Array(0xfffffffe)
.concat([42,43])
.length
Array(0xfffffffe)
.concat(Array(2))
.length
ECMA262 RangeError RangeError RangeError RangeError RangeError RangeError RangeError
Firefox RangeError RangeError RangeError (stop responding) RangeError 0 0
Chrome RangeError RangeError RangeError RangeError RangeError RangeError 0xffffffff
Safari RangeError RangeError RangeError Error("Out of memory") Error("Out of memory") Error("Out of memory") Error("Out of memory")
Edge RangeError RangeError RangeError RangeError RangeError 0xffffffff 0xffffffff

So, I think it is feasible to extend array's max length to 253-1.

@claudepache
Copy link
Contributor

Repeating the most interesting columns of the preceding table for easier reading:

Array(0xfffffffe)
.concat([42,43])
.length
Array(0xfffffffe)
.concat(Array(2))
.length
ECMA262 RangeError RangeError
Firefox 0 0
Chrome RangeError 0xffffffff
Safari Error("Out of memory") Error("Out of memory")
Edge 0xffffffff 0xffffffff

@allenwb
Copy link
Member

allenwb commented Jul 25, 2016

On Jul 25, 2016, at 8:53 AM, Claude Pache notifications@github.com wrote:

I've tested various ways of attempting to construct an array of out-of-bound length. The results are in the table below. The most interesting cases are in the last two columns, namely the behaviour of the .concat() method.

...

So, I think it is feasible to extend array's max length to 253-1.

The legacy concern wasn’t about the cases where such huge arrays might be created. The concern was about what happens when a value greater than 232-2 is used as an array index. In particular, consider:

var a = new Array(0);
console.log(length: ${a.length} keys: ${Object.keys(a)});
//length: 0 keys:

a[Math.pow(2,32)-2]=“x”;
console.log(length: ${a.length} keys: ${Object.keys(a)});
//length: 4294967295 keys: 4294967294
//note length auto updated

a[Math.pow(2,32)]=“x”;
console.log(length: ${a.length} keys: ${Object.keys(a)});
//length: 4294967295 keys: 4294967294,4294967296
//note length not updated, property beyond length added.

This can be happening today. Would changing this behavior (such that array lengths auto updated beyond length 232-1 break anything/ Nobody knows.

Allen

@claudepache
Copy link
Contributor

Would changing this behavior (such that array lengths auto updated beyond length 232-1 break anything/ Nobody knows.

So, what is the risk concretely? I imagine that some rare broken script could get even more broken...

But sure, the only way to know is to try. It would be a shame that nobody would want to try.

@mk-pmb
Copy link

mk-pmb commented Aug 8, 2016

Array wrapping sounds useful for tricking a nodejs/iojs server into writing to fields that do not expect user-provided data. Not sure though whether a higher or lower wrap index is more useful, probably depends on how that server reads or calculates the target index number.

@bterlson
Copy link
Member

bterlson commented Aug 9, 2016

@msaboff should I leave this open or has this proposal been withdrawn since last TC39?

@msaboff
Copy link
Contributor Author

msaboff commented Aug 9, 2016

@bterlson I still think it is an open issue.

@bterlson bterlson added the needs consensus This needs committee consensus before it can be eligible to be merged. label Aug 9, 2016
@littledan
Copy link
Member

@msaboff What are the next steps for this proposal?

@c69
Copy link

c69 commented Mar 15, 2018

No example of real code has been provided over years.
So, imho, all concerns against increasing max size in this thread are purely hypothetical fears...

From the other hand - the longer this change awaits - the higher is the chance for such legacy code to be written, and something important actually depending on this old limitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs consensus This needs committee consensus before it can be eligible to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants