Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message hashes don't change with contents #73

Closed
mikepurvis opened this issue Feb 8, 2017 · 5 comments
Closed

Message hashes don't change with contents #73

mikepurvis opened this issue Feb 8, 2017 · 5 comments

Comments

@mikepurvis
Copy link
Member

Eg:

$ python
Python 2.7.11 (default, Jan 22 2016, 08:29:18) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from std_msgs.msg import Int8
>>> foo = Int8()
>>> foo.data = 1
>>> hash(foo)
275653405
>>> foo.data = 2
>>> hash(foo)
275653405  # Should be different!
>>> bar = Int8()
>>> bar.data = 1
>>> hash(bar)
275876165  # Should be the same as the first one!

The default hash method supplied with object is based on identity rather than fields, hence the above behaviour.

Instead, generated messages should supply a __hash__ function which hashes the data, possibly as simple as just passing the message's string representation to hash.

@dirk-thomas
Copy link
Member

Neither the generated message class nor its base class override the __hash__ method. If you run the same code with any other Python object the result is the same for me, e.g.:

$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> class Foo(object):
...     def __init__(self):
...         self.bar = None
... 
>>> f = Foo()
>>> f.bar = 1
>>> hash(f)
8751631690301
>>> f.bar = 2
>>> hash(f)
8751631690301
>>> g = Foo()
>>> g.bar = 1
>>> hash(g)
8751631690309

@mikepurvis
Copy link
Member Author

Oh yes, no argument there. This argument is that they should, so that message objects may be added to a set for de-duplication purposes, etc.

@dirk-thomas
Copy link
Member

I don't think we should change the default semantic of the Python hash function for objects.

@mikepurvis
Copy link
Member Author

mikepurvis commented Feb 9, 2017

We already do it as a one-off for Time and Duration objects:

genpy/src/genpy/rostime.py

Lines 127 to 131 in e176e70

def __hash__(self):
"""
Time values are hashable. Time values with identical fields have the same hash.
"""
return hash((self.secs, self.nsecs))

The proposal here is simply to extend that behaviour to all messages.

Quoting the documentation:

The only required property is that objects which compare equal have the same hash value; it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by packing them into a tuple and hashing the tuple. Example:

   def __hash__(self):
       return hash((self.name, self.nick, self.color))

Given this, the default implementation based on identity seems bizarre.


Further down:

If a class defines mutable objects and implements an eq() method, it should not implement hash(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

So that seems conclusive that Msg objects probably shouldn't be hashable, since they all have mutable fields (including Time and Duration). I can certainly see the argument against use as dictionary keys, but I still think there'd be value in being able to put them in sets. In practice, few message objects are mutated after original creation anyway.

@dirk-thomas
Copy link
Member

The Duration and Time classes are considered to be a fancy representation of a float value. That is why they the hash based on the field values.

Considering that the fields are still mutable I agree that having a hash for those might potentially be "risky".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants