Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array#uniq considers each element eql? method to check if exists #2809

Closed
wants to merge 1 commit into from
Closed

Array#uniq considers each element eql? method to check if exists #2809

wants to merge 1 commit into from

Conversation

fernandes
Copy link
Contributor

The motivation for this PR is when you have an array with two elements:

array = [Element.new(name: 'foo'), Element.new(name: 'bar')]

In this case an array.uniq will always return two elements, but consider you want to return only on Element no matter the name, how should we proceed?

The idea is to implement a #eql? method on Element class the computes the hash based on instance information, so we can return unique elements, consider in our case we just want to return the first Element

class Element
  def eq?(other)
    return true if other.is_a?(Element)
  end
end

How the returned hash array.uniq will return

array = [Element.new(name: 'foo'), Element.new(name: 'bar')]
array.uniq
# [Element(name: foo)]

ps: I don't believe this is the best implementation, specially because this algorithm is O(n2), I'm justing opening this PR to check if makes sense this addition so I can work on a better solution to this problem.

Verified

This commit was signed with the committer’s verified signature.
jamesaimonetti James Aimonetti
Actually Array#uniq is considering only value of elements, so is
impossible to reduce instances when applying uniq.

Considering `eql?` method, if a object is equal any other object that
already exists on array, it's automatically removed.
@oprypin
Copy link
Member

oprypin commented Jun 12, 2016

wat

record Element, name : String
array = [Element.new(name: "foo"), Element.new(name: "bar")]
p array.uniq &.class

@jhass
Copy link
Member

jhass commented Jun 12, 2016

Hash, whose keyset is used here to determine the unique elements, already calls == for the equality check, it just depends on hash to return the same bucket. https://github.com/crystal-lang/crystal/blob/master/src/hash.cr#L935-L939 This needs to be better documented, contributions welcome.

That's also the reason for the existence of the def_equals_and_hash macro: http://crystal-lang.org/api/Object.html#def_equals_and_hash%28*fields%29-macro

So all we need to do is define hash and == for Element:

record Element, name : String do
  delegate hash, self.class

  def ==(other : self)
    true
  end
end
array = [Element.new(name: "foo"), Element.new(name: "bar")]
p array.uniq

https://carc.in/#/r/11de

@jhass jhass closed this Jun 12, 2016
@asterite
Copy link
Member

Does anyone know why ruby has the eql? method and doesn't use just == for hashes?

@jhass
Copy link
Member

jhass commented Jun 12, 2016

Probably just so you can have different behavior for hashes, or perhaps there was a time where == was syntax and not a method.

@jhass
Copy link
Member

jhass commented Jun 12, 2016

Okay one important difference is:

[57] pry(main)> 1.0.eql?(1)
=> false
[58] pry(main)> 1.0 == 1
=> true
[59] pry(main)> 1.eql?(1.0)
=> false
[60] pry(main)> 1 == 1.0
=> true

So that makes {1 => :a, 1.0 => :b} work as expected in Ruby.

@oprypin
Copy link
Member

oprypin commented Jun 12, 2016

For an alternative perspective, Python goes out of its way to make this work as expected. And our expectations are the opposite.

>>> hash(234), hash(234.0), hash(Decimal('234.0'))
(234, 234, 234)
>>> hash(0.5), hash(Fraction(1, 2)), hash(Decimal('0.5'))
(1152921504606846976, 1152921504606846976, 1152921504606846976)
>>> {234: 1, 234.0: 2, Decimal('234.0'): 3}
{234: 3}

If two equal objects have a different hash, that's a violation of the definition of hashes. I'm quite disappointed about this in Ruby. But it's not so important in Crystal.

@asterite
Copy link
Member

Yes, I think I like Python's behaviour better. I don't know what's a use case for having a Hash with int and float keys where you want to distinguish them (from a purist from of view I think it does make sense, though). We should probably use Python's hash algorithm too.

@fernandes
Copy link
Contributor Author

@BlaXpirit mine was a different case, but I think now you got it

@jhass thank you for pointing me to a solution; I'm porting a ruby lib to crystal, so I thought would make sense the eql? behavior, I checked the hash stuff, but instead of use eql? I just need to use ==

@asterite I just faced a situation where hash for different instances with same values are the same and need to make the check on "instance values level"

I'm pretty satisfied with the outcome here, thank you all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants