Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task graph key values are not interpolated #16

Closed
zgornel opened this issue May 30, 2018 · 3 comments
Closed

Task graph key values are not interpolated #16

zgornel opened this issue May 30, 2018 · 3 comments
Labels
bug Something isn't working

Comments

@zgornel
Copy link
Contributor

zgornel commented May 30, 2018

The values of keys are not correctly retrieved when the value is a key reference. Hashing is also broken as it does not return the hash of the refefenced key but the hash of the string itself (in the example below, graphchain.funcutils.get_hash('b', ...) != graphchain.funcutils.get_hash(1, ...)

import graphchain                                                                                                                                                                                                                                                                                                                                                                                                          
dsk = {'a':1, 'b': 'a', 'c': 'b'}                                                                                                                                                         
graphchain.get(dsk, 'c')
# 2018-05-30 11:21:57,162 - graphchain.funcutils - INFO - Creating the cache directory ...
# 2018-05-30 11:21:57,171 - graphchain.funcutils - INFO - Creating a new hash-chain file graphchain.json
# 2018-05-30 11:21:57,183 - graphchain.funcutils - INFO - * [a] (hash=fe750c1e1b562ba7c4c20fcb345e16b8)
# 2018-05-30 11:21:57,183 - graphchain.funcutils - DEBUG - `--> HASH MISS: src=NONE, arg=NONE dep=NONE has 0 candidates.
# 2018-05-30 11:21:57,184 - graphchain.funcutils - INFO - * [b] (hash=f2cd799a97dfb522a45ba33e5c16bbac)                                                                                                              
# 2018-05-30 11:21:57,184 - graphchain.funcutils - DEBUG - `--> HASH MISS: src=  OK, arg=MISS dep=  OK has 1 candidates.                                                                                             
# 2018-05-30 11:21:57,184 - graphchain.funcutils - INFO - * [c] (hash=4e9d3211321733b10db94cb52eabd8e1)                                                                                                              
# 2018-05-30 11:21:57,185 - graphchain.funcutils - DEBUG - `--> HASH MISS: src=  OK, arg=MISS dep=  OK has 2 candidates.                                                                                             
# 2018-05-30 11:21:57,185 - graphchain.graphchain - DEBUG - --- GraphChain Optimization complete ---                                                                                                                 
# 2018-05-30 11:21:57,193 - graphchain.funcutils - INFO - * [key=c constant=<class 'str'>] EXEC-STORE (hash=4e9d3211321733b10db94cb52eabd8e1)                                                                        
# 'b'  # <-- Output                                                                                                                                                                                                       
                                                                                                                                                                                                                   
graphchain.get(dsk,'b')                                                                                                                                                                                   
# 2018-05-30 11:22:11,745 - graphchain.funcutils - INFO - * [a] (hash=fe750c1e1b562ba7c4c20fcb345e16b8)                                                                                                              
# 2018-05-30 11:22:11,745 - graphchain.funcutils - DEBUG - `--> HASH MISS: src=  OK, arg=MISS dep=  OK has 1 candidates.                                                                                             
# 2018-05-30 11:22:11,745 - graphchain.funcutils - INFO - * [b] (hash=f2cd799a97dfb522a45ba33e5c16bbac)                                                                                                              
# 2018-05-30 11:22:11,745 - graphchain.funcutils - DEBUG - `--> HASH MISS: src=  OK, arg=MISS dep=  OK has 2 candidates.                                                                                             
# 2018-05-30 11:22:11,770 - graphchain.graphchain - DEBUG - --- GraphChain Optimization complete ---                                                                                                                 
# 2018-05-30 11:22:11,771 - graphchain.funcutils - INFO - * [key=b constant=<class 'str'>] EXEC-STORE (hash=f2cd799a97dfb522a45ba33e5c16bbac)                                                                        
#  'a'  # <-- Output                                                                                                                              
@zgornel zgornel added the bug Something isn't working label May 30, 2018
@lsorber lsorber mentioned this issue May 31, 2018
6 tasks
@lsorber
Copy link
Member

lsorber commented Jun 1, 2018

Related to this issue is dask/dask#3523. Specifically, Matthew Rockling clarified that:

we only traverse lists and dict values [to interpolate tasks]

This means that tasks appearing in lists or dict values should be interpolated by the output of those tasks. For us that means that get_hash should add those appearances to dephash_list instead of arghash_list.

In PR #18 the get_hash function is already upgraded to reflect this somewhat, but it is still incorrect (!) because it does not traverse lists and dict values, it only looks at the first level for other tasks.

@lsorber
Copy link
Member

lsorber commented Jun 1, 2018

In PR #18 the get_hash function uses this logic twice (once for literals, once for function arguments in the graph):

try:
    dephash_list.append(keyhashmap[task])
except Exception:
    arghash_list.extend(recursive_hash(task))

To fix the above comment, we should probably implement a _hash_literal function that replaces both of the above uses and like dask traverses lists and dict values.

@zgornel
Copy link
Contributor Author

zgornel commented Jun 2, 2018

Yes, this would actually simplify conceptually the whole hashing approach. BTW, there is also an additional hidden issue in the fact that the literal type is not taken into account right now during traversal. Most probably the two separate issues of traversing literals during hashing and the graph (for value interpolation) could be effectively combined into a single search.

@lsorber lsorber mentioned this issue Jul 31, 2018
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants