Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sis_hash_helper more efficient #86

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

albertz
Copy link
Member

@albertz albertz commented Jun 20, 2022

This gives me some huge speedup, of a big GMM pipeline, from 12 sec runtime down to 2 sec runtime.

Edit Sorry, wrong numbers...

@albertz
Copy link
Member Author

albertz commented Jun 20, 2022

Hm ok I need to do some debugging, this seems to change the hash.

@albertz albertz marked this pull request as draft June 20, 2022 12:39
@JackTemaki
Copy link
Contributor

Yes, my setup before the change:
error(7) runnable(5) waiting(649)

My setup after the change:
error(3) runnable(12) waiting(247)

@albertz
Copy link
Member Author

albertz commented Jun 20, 2022

Ok, similar fix as in #89. Now for my simple test I get the same hash.

@albertz albertz marked this pull request as ready for review June 20, 2022 12:48
@albertz
Copy link
Member Author

albertz commented Jun 20, 2022

However, the speedup is much less now. Need to do some benchmarks. I think it should still be faster than before though.

@albertz
Copy link
Member Author

albertz commented Jun 20, 2022

However, the speedup is much less now. Need to do some benchmarks. I think it should still be faster than before though.

Actually, no? It seems slightly slower. Maybe due to higher memory consumption? I need to investigate this a bit more...

@albertz albertz marked this pull request as draft June 20, 2022 12:53
sisyphus/hash.py Show resolved Hide resolved
sisyphus/hash.py Show resolved Hide resolved
sisyphus/hash.py Show resolved Hide resolved
@albertz
Copy link
Member Author

albertz commented Jun 21, 2022

However, the speedup is much less now. Need to do some benchmarks. I think it should still be faster than before though.

Actually, no? It seems slightly slower. Maybe due to higher memory consumption? I need to investigate this a bit more...

I'm also thinking about actually do more clever caching. But not exactly sure where. Unfortunately the objects we get here are often dict or list or some other config objects, and they are all mutable, so you cannot really cache the hash, unless you make sure the cache is correctly invalidated, which is not easily possible always (e.g. no idea how you would do that for a dict). Maybe in JobSingleton.__call__ we can do it for the args, because afterwards the args should anyway not change anymore.

@albertz
Copy link
Member Author

albertz commented Jun 21, 2022

Also, I think the current change in this PR here does not fully captures all the recursive calls of sis_hash_helper. The recursive calls are often through obj._sis_hash calls.

@albertz albertz force-pushed the albert-sis-hash-helper-more-efficient branch from 4e6fb53 to 2f4cbe0 Compare June 21, 2022 09:01
@critias
Copy link
Contributor

critias commented Jun 21, 2022

Also, I think the current change in this PR here does not fully captures all the recursive calls of sis_hash_helper. The recursive calls are often through obj._sis_hash calls.

You could also cache the result of obj._sis_hash

@albertz albertz mentioned this pull request Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants