-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add examples telemetry #17552
Add examples telemetry #17552
Conversation
The documentation is not available anymore as the PR was closed or merged. |
# Sending telemetry. Tracking the example usage helps us better allocate resources to maintain them. The | ||
# information sent is the one passed as arguments along with your Python/PyTorch versions. | ||
model_name = None if os.path.isdir(model_args.model_name_or_path) else model_args.model_name_or_path | ||
if data_args.task_name is not None: | ||
dataset_name = f"glue-{data_args.task_name}" | ||
elif data_args.dataset_name is not None: | ||
dataset_name = data_args.dataset_name | ||
else: | ||
dataset_name = None | ||
send_example_telemetry("run_glue", model_name=model_name, dataset_name=dataset_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have as few lines of code as possible for the telemetry, so as to not trouble understanding. Here I understand all of that block is just for the send_example_telemetry
thanks to the diff, but if I was reading the example in order to understand it I think I would think that model_name
and dataset_name
are re-used afterwards.
How about passing the model_args
/data_args
directly to the method? Or if you think there is too much information in there, how about defining a local method so that we understand the scope of these variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's perfect! Thanks for iterating, @sgugger
@@ -399,6 +399,10 @@ def main(): | |||
else: | |||
model_args, data_args, training_args = parser.parse_args_into_dataclasses() | |||
|
|||
# Sending telemetry. Tracking the example usage helps us better allocate resources to maintain them. The | |||
# information sent is the one passed as arguments along with your Python/PyTorch versions. | |||
send_example_telemetry("flax_run_summarization", model_args, data_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this, I was wondering if we shouldn't add the framework in front of all examples: flax_run_summarization
, pytorch_run_summarization
, etc.
But I wonder if we can't do it better by doing it with pytorch/run_summarization
-> would that lead to better analysis with kibana? In any case I think having the framework name is super useful, so I'd personally add it to all examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
Thanks a lot for working on this @sgugger - that's super useful! |
* Add examples telemetry * Alternative approach * Add to all other examples * Add to templates as well * Put framework separately * Same for TensorFlow
* Add examples telemetry * Alternative approach * Add to all other examples * Add to templates as well * Put framework separately * Same for TensorFlow
What does this PR do?
This PR adds a function to send telemetry to help us track the examples usage and uses it in the current examples. For now, I've just added in the PyTorch
run_glue.py
, but will paste it in all other examples if you agree with the format/data tracked.