-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CLIs for managing workers. #11
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, can you also add the descriptions for the new commands in the main README.md
file?
worker_ids = list(ids) | ||
|
||
# read ids from file (adds to provided ids) | ||
if file is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like file
might be more suitable as a default argument. As well as some other arguments such as which can take the value None
.
is_flag=True, | ||
help='View the status of HITs from the live MTurk site.') | ||
def associate_qual(file, ids, qual, name, value, notify, live): | ||
"""Associate workers with a qualification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be helpful if you could provide a more complete description of the args as well as what the method returns.
amti/clis/associate.py
Outdated
def associate_qual(file, ids, qual, name, value, notify, live): | ||
"""Associate workers with a qualification. | ||
|
||
Given a space seperated list of WorkerIds and/or a path to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"separated" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for submitting this PR! It implements a few much-needed features.
I left a some comments inlined with the code, but something I'm generally not sure about is how to handle the UX differences between performing these actions with one worker ID versus with several. It seems useful to have batch and one-off modes because it's much more performant to batch the requests. Would it make sense to implement these in separate commands though? Also, in the batch mode, can things like the qual to assign or take away / the reason for a block be passed per worker by using a CSV format with those columns as well (instead of just worker IDs)?
is_flag=True, | ||
help='View the status of HITs from the live MTurk site.') | ||
def associate_qual(file, ids, qual, name, value, notify, live): | ||
"""Associate workers with a qualification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation conventions for the code base are implicit at the moment, and should probably be put in a document somewhere, but the gist is that:
- click commands and command groups
- Doc strings should begin with 1 line summarizing the function of the script.
- After the summary line, optionally several paragraphs can explain the arguments and usage of the script in more detail.
- Since arguments are described in the main text, they don't need a separate section describing them.
- Arguments should be referred to in all caps with underscores (matching how they'll appear in help output).
- Options are documented with their own help strings and should not be referred to in the main text (unless there's a really good reason).
- python functions and classes (that are not click commands)
- First follow numpy doc string conventions.
- Then follow pep257 (python doc string conventions).
- Then pep8 (python style guide).
The click command doc strings here mostly just need the arguments described in the running text, the parameters sections removed, and the argument names put in all caps.
amti/utils/workers.py
Outdated
import csv | ||
from typing import List, Optional | ||
|
||
def create_batches(items: List, n=100) -> List: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create_batches
might be easy to confuse with the create_batch
command, which can be avoided by naming this function something like batch_list
or chunk_list
.
Similarly, for the n
argument, naming it batch_size
and having no default forces people to explicitly pass it and could make the loops much more readable. Since the right batch size varies a lot from situation to situation, it's probably worth forcing callers to explicitly consider it.
Lastly, since the function is pretty general it might be better to put it in a new module (amti.utils.misc
for example).
amti/utils/workers.py
Outdated
from typing import List, Optional | ||
|
||
def create_batches(items: List, n=100) -> List: | ||
""" Create generator that splits items into batches of size n. """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the leading and trailing space from the first line of these doc strings, and add Parameters
and Returns
sections (see this function for example).
for i in range(0, len(items), n): | ||
yield items[i:i + n] | ||
|
||
def read_workerids_from_file(file: click.Path) -> List: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the expected file format for this function is one worker id per line, possibly with a header. If that's the case, it might be simpler to just require these files not have headers and read the file directly:
with open(worker_ids_path, 'r') as worker_ids_file:
worker_ids = [worker_id.strip() for ln in worker_ids_file]
If it is important to read CSVs, I'd suggest having read_workerids_from_file
take the file-like object as an argument. The advantage over passing around the path is that then the CLI commands can open the files using click.open_file
which supports things like using -
to represent standard out. This feature enables amti
to be used in unix pipelines.
amti/utils/workers.py
Outdated
|
||
return worker_ids | ||
|
||
def get_qual_by_name(client: boto3.client, qual_name: str) -> Optional[dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this utility can go in amti.utils.mturk
.
For UI changes - we could break out commands into groups. For example:
I think click allows you to group commands pretty easily. |
Also I can't think of a better way to handle single vs multiple WorkerIds. It's a bit clunky, but I feel like you should be able to handle them with the same command, since it's the same action. This way, you can provide any number of WorkerIds on the command line (makes it easy to handle one or two workers). And if you want to do a large batch, you can leave the |
Sounds good to me. I really like the idea of having Having one command per action makes sense to me UI-wise. We should handle the additional columns for other arguments in the case that it takes CSV input (like
Or something like that. This issue discusses implementing mutually exclusive options in click, though I think the simplest and probably best approach is to just put a little validation logic at the beginning of the command rather than trying to fit it into click's callbacks / parameter validation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, let's get this merged 😄!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
1419ff7
Added commands for:
Added new utils file for worker management functions.