-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update worker hash ring #58
Conversation
Add and create the test file via adding following code to dora
@ChunxuTang @jja725 Cannot get the hash ring result correct so far |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me. Just left some minor comments.
BTW, I saw there are quite a number of commits in this PR. Could you squash them into fewer comments?
alluxio/alluxio_file_system.py
Outdated
worker_hosts (str, optional): | ||
The worker hostnames in host1,host2,host3 format. Either etcd_hosts or worker_hosts should be provided, not both. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest keeping the worker_hosts
for compatibility, although the functionality is probably broken at this moment. Instead, we can add a warning when users only provide workers without etcd host(s).
This is a change of the AlluxioFileSystem
, the interface used by other applications. If we remove the variable (backward compatibility breaks), we have to update the fsspec. Afterward, when we add worker hosts back, we'll have to update the fsspec again (break again).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, i will leave it here and fix it in the later static membership PR
alluxio/alluxio_file_system.py
Outdated
if etcd_hosts is None and worker_hosts is None: | ||
raise ValueError( | ||
"Must supply either 'etcd_hosts' or 'worker_hosts'" | ||
) | ||
if etcd_hosts and worker_hosts: | ||
raise ValueError( | ||
"Supply either 'etcd_hosts' or 'worker_hosts', not both" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest changing it to a warning if users only provide worker hosts without etcd hosts.
self.host = host | ||
self.port = port | ||
self._host = host | ||
self._port = port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reviewing other parts of the code, I would say we generally use names without the leading underscore for data members. Changing them back to self.host
and self.port
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the google python convention is using underscore to indicate private member values. I remember some other parts have changed to this new convention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about i create another PR to fix other parts of the code base after this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Thanks.
self.etcd_username = None | ||
self.etcd_password = None | ||
self.prefix = ETCD_PREFIX_FORMAT.format( | ||
self._etcd_username = None | ||
self._etcd_password = None | ||
self._prefix = ETCD_PREFIX_FORMAT.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
alluxio/worker_ring.py
Outdated
worker_addresses = [] | ||
for worker_identity in worker_identities: | ||
worker_address = self._worker_info_map.get(worker_identity) | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary to have the parenthesis around the worker_address
?
alluxio/worker_ring.py
Outdated
if worker_addresses != self._worker_addresses: | ||
self._update_hash_ring(worker_addresses) | ||
worker_info_map = {} | ||
detect_diff_in_worker_info = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=> diff_in_worker_info_deteced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvement! Make some comments
self.logger = logger or logging.getLogger("AlluxioFileSystem") | ||
if etcd_hosts and not worker_hosts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use Union type in python to indicate OR semantic
self.session = self._create_session(concurrency) | ||
|
||
# parse options | ||
page_size = ALLUXIO_PAGE_SIZE_DEFAULT_VALUE | ||
hash_node_per_worker = int(ALLUXIO_HASH_NODE_PER_WORKER_DEFAULT_VALUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we put default value as string and convert here? We can just define as int at the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChunxuTang previously put Alluxio related key=value string pair into the const file, we can discuss about what are the best way for dealing with property key and values
if options: | ||
if ALLUXIO_PAGE_SIZE_KEY in options: | ||
page_size = options[ALLUXIO_PAGE_SIZE_KEY] | ||
self.logger.debug(f"Page size is set to {page_size}") | ||
if ALLUXIO_HASH_NODE_PER_WORKER_KEY in options: | ||
hash_node_per_worker = int( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The const is all string format, the const file is kind of Alluxio property key files now
worker_entity.worker_identity | ||
] = worker_entity.worker_net_address | ||
if worker_entity.worker_identity not in self._worker_info_map: | ||
diff_in_worker_info_detected = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can early break here if we detect diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worker_info_map[
worker_entity.worker_identity
] = worker_entity.worker_net_address
here we also need to update the local worker_info_map for update hash ring
The docker related tests failed with cannot launch worker issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. Just left some minor comments.
alluxio/alluxio_file_system.py
Outdated
@@ -141,20 +143,32 @@ def __init__( | |||
"Supply either 'etcd_hosts' or 'worker_hosts', not both" | |||
) | |||
self.logger = logger or logging.getLogger("AlluxioFileSystem") | |||
if etcd_hosts and not worker_hosts: | |||
self.logger.warning( | |||
"Does not supply 'etcd_host'. Cannot tolerate Alluxio cluster changes." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etcd_host
=> etcd_hosts
Cannot tolerate Alluxio cluster changes.
=> A etcd cluster is required for dynamic cluster changes.
self.host = host | ||
self.port = port | ||
self._host = host | ||
self._port = port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Thanks.
Update worker hash ring to use worker identity, align with AOS 309