-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PARO-720] ENH Model sharing helpers #315
[PARO-720] ENH Model sharing helpers #315
Conversation
A CivisML model has File and Project run outputs, each of which need to be shared to effectively share the "model". This helper will transparently handle sharing the run outputs. Include JSONValue in case a future version of CivisML uses that. The functions are patterned after the autogenerated API sharing endpoints.
@elsander , see what you think of the naming and other design choices here. It's not obvious what the right interface would be. I considered adding a |
civis/ml/tests/test_helper.py
Outdated
|
||
|
||
def test_share_model_debug_log(): | ||
# Debug logs need "write" permission to be shared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessarily the behavior I'd expect. Why do we only share debug logs to people with write permissions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning was that the debug logs are more internal workings of the model. If you just want someone to be able to see your model, with the lowest levels of permissions, you might not want to expose all of the logging. I'm certainly open to discussion on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right in most cases, a person who only needs read permissions won't need access to the debug logs. But I don't think it would ever be a problem to expose the debug logs, and there are some cases where you might expect read permissions to provide log access. Probably the most common is CS sharing one of us on a client job so that we can examine the logs and debug, but I could also imagine clients sharing jobs to debug internally, before they escalate to CS. My preference is to share all run outputs for consistency, but I'm open to discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this makes the code simpler, so I'm happy to make the change. I'm not thinking of anything sensitive in the logs which wouldn't also be in the other model artifacts which would be shared at read level.
elif _output['object_type'] == 'Project': | ||
_func = getattr(client.projects, "put_shares_" + entity_type) | ||
if permission_level == 'read': | ||
# Users must be able to add to projects to use the model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my own understanding, why is this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CivisML training job keeps a "scoring jobs" project as a run output. CivisML scoring jobs add themselves to this project so that users (and the Civis Platform UI) can link from training jobs to all dependent scoring jobs. Users need "write" permission to add things to projects.
I reasoned that users would expect that "read" permission on a model would give them the ability to make their own scoring jobs based on it. Does that make sense? The project is the only thing which would prevent you from scoring with only read permissions, and the error you get is a bit cryptic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I didn't think about whether a "read" user should be able to score based off the original model rather than a clone. That reasoning makes sense to me, thanks.
Follow-up question: what permissions does the original user have on a scoring job created by the new user in this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original user wouldn't have any special permissions on the scoring job by default. If you don't have "read" permission on the scoring job, it wouldn't be visible to you in the project. I think this is expected behavior -- I could imagine the "scoring" project being filled with scoring jobs, and no one user being able to see all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I agree that this is expected behavior.
"put_shares_" + entity_type) | ||
obj_permission = permission_level | ||
else: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to issue a message to the debug log if you hit this condition?
Also, are there any outputs where we'd expect to hit this condition? If you write OOS scores to a table, does that show up as a run output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a debug message definitely makes sense. I'll add that.
I don't expect to ever hit this condition. I believe that Tables are the only possible run output not covered in this loop, and you can't grant people permissions on tables through the API endpoints. CivisML doesn't add any tables as run outputs, AFAIK.
_func = getattr(client.projects, endpoint_name) | ||
elif _output['object_type'] == 'JSONValue': | ||
_func = getattr(client.json_values, endpoint_name) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note here, would it be useful to log skipped outputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will do.
|
||
Returns | ||
------- | ||
readers : dict:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by the type annotation here. It looks like "users" and "groups" are lists of dicts, right? Should this say list[dict]
to specify that explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copy-pasted this directly from the corresponding doc strings for other sharing endpoints. I'll take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The readers
is a single dictionary. It contains keys users
and groups
, each of which is a list of dictionaries. In Python notation, I think you're right that the types of users
and groups
would be list[dict]
, but I think that what's show here is also correct, and consistent with the type notation used in the other API endpoints.
I think the function names make a lot of sense. I could see the list function being useful, but if it's complex to implement it's probably not worth doing right now. If we hear requests for it we could reevaluate. |
@elsander , I addressed your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
A CivisML model has File and Project run outputs, each of which need to be shared to effectively share the "model". This helper will transparently handle sharing the run outputs. Include JSONValue in case a future version of CivisML uses that. The functions are patterned after the autogenerated API sharing endpoints.