-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How Should We Implement Set and a first look at how potential API changes impact code. #84
Comments
Thanks for putting together a real preliminary implementation—I think it'll help clarify a lot of the things we were discussing in the abstract in #80. I'll go through this later today or tomorrow and also catch up on the other issue. |
Should be noted that not all of them are implemented. Refactoring for example. We also might consider collapsing all of the stuff we put into core into numbers itself. |
Took a stab at implementing Set: https://github.com/ethanresnick/numeric.ly/blob/master/Set.js Few thoughts:
`````` ```
|
Cool,
This is great that we have the basics of both types of implementations to look at and compare between. When we design this we need to keep three things in mind.
Based off of the +/- to both I really don't know which one we should go with. Notes on ECMAScript 6 Set support: |
Totally agree that we have to optimize for search/comparison, which using an Array doesn't do. At the same time, I don't think trying to hash objects is necessarily the way to go—it introduces too much extra complexity and even the best hashing function will require sacrifices (e.g. no way to make a Set of functions, because hashing the function will destroy the scope). I wonder if instead the Set could have two internal data stores, an object for primitives and an array for objects. Something like: /**
* Returns whether arg is a string or a number (except -0), which are the only things that
* can be preserved losslessly as object keys (Infinity, NaN, false, null, etc. can't be).
*/
function _isPrimitiveData(arg) {
var type = typeof arg;
return type === "string" || (type==="number" && 1/arg !== -Infinity);
}
function has(value) {
if(_isPrimitiveData(value) ) {
return this.__primitiveData.hasOwnProperty(value);
}
return numbers.util.inArray(value, this.__objectData) !== -1;
}
function add(value) {
if(_isPrimitiveData(value)) {
this.__primitiveData[value] = true;
}
else {
this.__objectData[this.objectData.length] = value;
}
//put the value on the array for interoperability
this[this.length] = value;
return this;
}
function remove(value) {
var internalIndex = -1, publicIndex;
//remove the value from the appropriate internal collection
//and set internalIndex if the value exists on the public array.
if(_isPrimitiveData(value)) {
if(this.has(value)) {
internalIndex = true;
delete this.__primitiveData[value];
}
}
else if((internalIndex = numbers.util.inArray(value, this.__objectData)) !== -1) {
this.__objectData.splice(internalIndex, 1);
}
//remove value from the public array if we found it earlier
if(internalIndex !== -1) {
if((publicIndex = numbers.util.inArray(member, this)) !== -1) {
this.splice(publicIndex, 1);
}
}
return this;
} Thoughts? Obviously some extra overhead in keeping the public array up to date, but that's heaviest in |
Above shows that comparing two objects will only work when both objects point to the same point in memory, which with most primitive values it's fine, but Objects get tricky (and Objects based off of arrays get even trickier). We would either need to do: Run
|
Ok, ok, I think I understand the problem now. Thanks for walking me through it. Just to paraphrase what you said to make sure we're on the same page: We need a check more sophisticated than So, to check object equality, we could run If I'm understanding you right, then I agree with all of that. In addition, I think:
Details of the hashing functionHere's a first draft, with comments on the issues I see:
Some tests:
function has(arg) { return Object.prototype.hasOwnProperty.call(this._data, hash) && (type !== "function" || this.data[hash]===arg);
|
Sorry if I wasn't getting my point across properly will work on being more concise and talking about the first principles of the problem. I like that we've agreed on the need to hash, but some things to note: 1.We should consider a function to extend the set to only work quickly in addition/deletion.
this creates a set that will store everything in the object and will not extend an array. While this is not ideal from an API standpoint, it will be a nice alternative for working on cases where an internal array will not be necessary. 2.
This is not necessary, since the values are going to be in the array we are extending. Also the only time obtaining the value is useful is when we are converting the Set to another type (say object). The time complexity of your current implementation is, which is limited given the fact we are extending add:
remove:
has:
toArray:
Based off of the above, you can see that having the object store the values too, has no benefit, if this was written by extending an object, then it would have a benefit. It might be best if they just store references to If you are not happy with the performance of the remove function, we could keep the internal array sorted. This would only effect the
3.JSON is an unordered set of values, so each javascript engine will sort values differently and when we call 4.I'm confused as to what is meant by "function." Functions, such as |
I'm open to also having a Set that extends Object, which would look very similar to our current one. API for creating it might be something like: Having As for sorting in |
|
I don't understand what you mean by "if you push the logic of deleting into remove then you would essentially run into a complexity limit in toArray". Isn't the logic for deleting is already in remove, and won't toArray will always be O(1)? |
If you add elements to an array you must remove them at the same time. Searching (see above remove function in an earlier comment) is a necessity. If you wish to remove that as a necessity, don't store any values in an array and when an array is needed, convert from an object to an Array. Not sure if this is possible if we are extending the array class. |
Hey, Just getting back to this; needed to take a break for a few days. I see what you're saying about |
Also, there's an issue that's been nagging at me throughout this conversation (see code):
Am I right in thinking that's how the code would work, or am I missing something? If this is what would happen, you could say that the One solution, then, might be to require a re-hash, i.e.:
But I really don't like the API of that; too inconvenient, and error-prone. So I'd be tempted to give up a fair amount of performance (not the strong suit of this library anyway) for convenience. So
I think this'll be faster than the
Further, we could go back to an But, of course, Any better ideas? |
So Ethan, If you're checking if something exists or is contained in the set, then the change in state of x is not a bad thing. It should result in
This is fine, if not ideal behavior. The problem is if the data in the array is not congruent to the hash due to referencing issues:
If this is the case, we need to find a way to ensure that the expected If a developer wants to remove x from the set if they modify it, then:
Can you rephrase the benefits from |
Yes, obviously this only applies to references (which is what I was alluding to with the mention of the Copy on add is an option. It's a little unexpected maybe (has to be clearly documented), but it's probably the best option. |
Hey all, sorry I've been MIA for a bit, busy with the holidays and now work. Finally free to get on this for a bit. I'll be reading this over and getting back shortly. |
Hey all,
I was trying to test out some stuff from #80, and rather than changing up everyone's work on the Matrix data structure, I thought I'd give Set a try. The reason this is not a PR is that there is still work to be done (namely testing and potential problems below).
problems with Set currently:
(rather than creating it from Object.keys() each time the values are necessary).
(If not this should be redeveloped to work more efficiently with numbers only).
My bias is for Set to work with JSON (potentially too large of an overhead)
and objects developed within the library. Set will be primarily useful for
working with numbers, but has a lot of strength as a basic data structure.
(I use it all the time in python)
This becomes problematic when working with the methods that accept input
from multiple types (basically the developer would have to wrap any arrays that they'd
like to add to a set in another array, and the functions below would need to be changed).
My recommendation is to have it be part of method chaining and then have the static method return T/F.
There's some code repetition but it helps keep the API following a similar structure.
And if someone needs to determine if all were in the set then they could do:
if (number.set.remove(A, [1,2,3]).reduce(function(a, b) { return !!a && !!b; }))
A.remove([1,2,3]);
complications (a self balancing binary search tree or a trie, handled in a javascript array
could cause a lot of array recreation). It might be possible to extend the Array object and
contain similar logic laid out below (see 1). This is also useful because we could implement:
map, reduce, splice, join, etc. on the set with minimal effort.
The text was updated successfully, but these errors were encountered: