Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLIs for managing workers. #11

Merged
merged 7 commits into from
May 1, 2019

Conversation

rob-dalton
Copy link
Contributor

Added commands for:

  • Blocking workers.
  • Unblocking workers.
  • Sending notification messages to workers.
  • Associating qualifications with workers.
  • Disassociating qualifications with workers.

Added new utils file for worker management functions.

@nalourie-ai2 nalourie-ai2 requested review from chaitanyamalaviya and removed request for csbhagav April 2, 2019 05:22
Copy link

@chaitanyamalaviya chaitanyamalaviya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, can you also add the descriptions for the new commands in the main README.md file?

worker_ids = list(ids)

# read ids from file (adds to provided ids)
if file is not None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like file might be more suitable as a default argument. As well as some other arguments such as which can take the value None.

is_flag=True,
help='View the status of HITs from the live MTurk site.')
def associate_qual(file, ids, qual, name, value, notify, live):
"""Associate workers with a qualification.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be helpful if you could provide a more complete description of the args as well as what the method returns.

def associate_qual(file, ids, qual, name, value, notify, live):
"""Associate workers with a qualification.

Given a space seperated list of WorkerIds and/or a path to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"separated" :)

Copy link
Contributor

@nalourie-ai2 nalourie-ai2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for submitting this PR! It implements a few much-needed features.

I left a some comments inlined with the code, but something I'm generally not sure about is how to handle the UX differences between performing these actions with one worker ID versus with several. It seems useful to have batch and one-off modes because it's much more performant to batch the requests. Would it make sense to implement these in separate commands though? Also, in the batch mode, can things like the qual to assign or take away / the reason for a block be passed per worker by using a CSV format with those columns as well (instead of just worker IDs)?

is_flag=True,
help='View the status of HITs from the live MTurk site.')
def associate_qual(file, ids, qual, name, value, notify, live):
"""Associate workers with a qualification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation conventions for the code base are implicit at the moment, and should probably be put in a document somewhere, but the gist is that:

  1. click commands and command groups
    • Doc strings should begin with 1 line summarizing the function of the script.
    • After the summary line, optionally several paragraphs can explain the arguments and usage of the script in more detail.
    • Since arguments are described in the main text, they don't need a separate section describing them.
    • Arguments should be referred to in all caps with underscores (matching how they'll appear in help output).
    • Options are documented with their own help strings and should not be referred to in the main text (unless there's a really good reason).
  2. python functions and classes (that are not click commands)

The click command doc strings here mostly just need the arguments described in the running text, the parameters sections removed, and the argument names put in all caps.

import csv
from typing import List, Optional

def create_batches(items: List, n=100) -> List:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_batches might be easy to confuse with the create_batch command, which can be avoided by naming this function something like batch_list or chunk_list.

Similarly, for the n argument, naming it batch_size and having no default forces people to explicitly pass it and could make the loops much more readable. Since the right batch size varies a lot from situation to situation, it's probably worth forcing callers to explicitly consider it.

Lastly, since the function is pretty general it might be better to put it in a new module (amti.utils.misc for example).

from typing import List, Optional

def create_batches(items: List, n=100) -> List:
""" Create generator that splits items into batches of size n. """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the leading and trailing space from the first line of these doc strings, and add Parameters and Returns sections (see this function for example).

for i in range(0, len(items), n):
yield items[i:i + n]

def read_workerids_from_file(file: click.Path) -> List:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the expected file format for this function is one worker id per line, possibly with a header. If that's the case, it might be simpler to just require these files not have headers and read the file directly:

with open(worker_ids_path, 'r') as worker_ids_file:
    worker_ids = [worker_id.strip() for ln in worker_ids_file]

If it is important to read CSVs, I'd suggest having read_workerids_from_file take the file-like object as an argument. The advantage over passing around the path is that then the CLI commands can open the files using click.open_file which supports things like using - to represent standard out. This feature enables amti to be used in unix pipelines.


return worker_ids

def get_qual_by_name(client: boto3.client, qual_name: str) -> Optional[dict]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this utility can go in amti.utils.mturk.

@rob-dalton
Copy link
Contributor Author

For UI changes - we could break out commands into groups. For example:

  • amti batch <command>
  • amti workers <command>

I think click allows you to group commands pretty easily.

@rob-dalton
Copy link
Contributor Author

Also I can't think of a better way to handle single vs multiple WorkerIds. It's a bit clunky, but I feel like you should be able to handle them with the same command, since it's the same action.

This way, you can provide any number of WorkerIds on the command line (makes it easy to handle one or two workers). And if you want to do a large batch, you can leave the IDS arg empty and just provide a file path.

@nalourie-ai2
Copy link
Contributor

Sounds good to me.

I really like the idea of having amti batch and amti workers command groups, but I think we could handle that in a follow up PR to keep things moving. Changing the UI like that might also justify a larger refactor of the code base, with subpackages for the different command groups.

Having one command per action makes sense to me UI-wise. We should handle the additional columns for other arguments in the case that it takes CSV input (like --reason for example). Similarly, we should allow the delimiter to be specified by the user, since at a minimum people will probably have both CSV and TSV files. Also, with two input modes, we should have more thorough validation, i.e.

  1. if --file is passed, we should check that ids is None and --reason is not used.
  2. if ids is passed, we should validate that --file is not passed and --reason is present.

Or something like that. This issue discusses implementing mutually exclusive options in click, though I think the simplest and probably best approach is to just put a little validation logic at the beginning of the command rather than trying to fit it into click's callbacks / parameter validation.

nalourie-ai2
nalourie-ai2 previously approved these changes Apr 26, 2019
Copy link
Contributor

@nalourie-ai2 nalourie-ai2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, let's get this merged 😄!

Copy link

@chaitanyamalaviya chaitanyamalaviya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rob-dalton rob-dalton dismissed stale reviews from chaitanyamalaviya and nalourie-ai2 via 1419ff7 May 1, 2019 18:12
@rob-dalton rob-dalton merged commit 158a9a1 into allenai:master May 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants