-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make CollectionIDs a 32bit hash value of the collection name #412
Conversation
How does this fare with reading files prior to this? It should work because we just replace the calculation of the ID, right? Can we easily verify that (in CI)? |
@tmadlener - on fixing the undetected narrowing from 64 to 32 bit I found quite a few signed/unsigned inconsistencies which haven't been caught before. the collectionID is now uint64_t throughout |
What's the final take on the hash size, 64 bits? With 64 bits there is some small collision probability with many collections, right? |
Yes. 64bit hash size. 32 bit was too little to be safe from collisions for a huge number of collections. With 64 bits we are in very safe territory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making the collectionID have 64 bits (and planning to use all of them) has potentially a few more implications, e.g. the id
method will no longer be very useful:
unsigned int id() const { return getObjectID().collectionID * 10000000 + getObjectID().index; } |
Regarding backwards compatibility, for SIO we would need to increase the version number for the podio/include/podio/SIOBlock.h Lines 101 to 108 in 705721d
Line 16 in 705721d
This could then be handled accordingly on the reading side: Lines 35 to 42 in 705721d
There is also a missing conversion to Line 49 in 705721d
|
Thanks. The SIO I really didn't test properly yet. |
This looks like it needs a rebase (and potentially some conflict resolution). For me this looks good, the main points that we need discussion IMHO are:
|
Yes, that would be something for the next EDM4hep meeting to discuss. |
53f29e6
to
6d86c3f
Compare
There are some conflicts. I think they should resolve themselves with a rebase on to master |
rebase didn't do it |
@tmadlener thanks |
One of the sanitizer workflows seems to be picking up on #174 and |
I added another tag to the failing test to ignore it in UndefinedBehavior sanitizer runs since it is picking up on a known issue and the test will be obsolete once the EventStore is removed in any case. |
Just to confirm. This is no longer the case and this PR is ready as it is, right? |
It is 64 bits now, or was I mistaken in changing the title of the PR? |
During the last call one week ago it was agreed that it would be 32 bit and collisions would be dealt with due to concerns about increasing sizes of classes (for example due to padding) and files (?), personally I would like to see some numbers first in case dealing with collisions turns out to be some work... |
I have added a small utility tool that takes a list of collection names and checks for collisions using the chosen hash function. It should be fairly straight forward to come up with a list of all currently used collection names and see if we have collisions in there already. Speaking of collisions @hegner, I think we could also use this PR to make collisions more visible in the Frame, currently this is effectively silent (and potentially incomplete): Lines 318 to 327 in 76c98a6
which calls Lines 410 to 427 in 76c98a6
|
Test that will be deprecated with EventStore, so should be OK
@tmadlener - thanks for the rebase |
Alright, I collected this list of collection names unique_coll_names.txt from the following sources:
This currently yields 458 unique collection names which have no collisions, and puts us somewhere into the 1:10000 and 1:100000 collision probability range according to this table here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (and hope) we have now considered pretty much all the collection names that are currently in use. I didn't find any collisions among them, so at least for now 32 bits seem to be enough.
BEGINRELEASENOTES
ENDRELEASENOTES
This PR is work in progress. The frame interface works. However the legacy interface used the previous structure of IDs for some optimizations, which I have to remove.