You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need a way to enforce fine-grained permissions in ehrQL. The immediate use case is to control access to specific tables and replace the janky "regex the Python and create a Github issue" system that we've been limping along with. Another shortly upcoming use case is controlling access to event-level outputs while we figure out the necessary technical and social safeguards on using that appropriately. I'm sure we will end up discovering other needs for this as well.
This is going to involve touching several components of the system (job-server, job-runner and ehrQL) but as ehrQL is going to be the ultimate consumer of these permissions I thought it made sense to open the issue here.
At a sufficiently high, hand-wavy, level the production system has a fairly obvious design:
Job Server needs to have a list of ehrQL permissions that can be attached to a workspace/project/$APPROPRIATE_ENTITY.
The JobRequestsAPI (exposed here) needs to grow a new field called something like permissions which encodes the relevant permissions for each JobRequest as an opaque blob of JSON.
Job Runner needs to store this field on the Job object and pass it to any relevant jobs (just ehrQL jobs, or should we make it all jobs?) via an environment variable, say OPENSAFELY_PERMISSIONS.
ehrQL needs to read this environment variable, decode it, and throw an error if the dataset definition is trying to do anything it shouldn't.
The harder design challenge is the local development experience for researchers. Here are some properties I think the system ought to have:
It should be clear in the documentation which tables/features require special permission to use, why they require special permission, and how to go about asking for it.
Our tests or our docs build system should make it impossible to restrict a table or feature without also adding the required documentation.
Attempting to use a restricted table/feature should, by default, immediately throw a clear error.
If you do have permission to use the table/feature it should be obvious what you need to do to make this error go away.
If you do not (yet) have permission to use the table/feature it should also be obvious how you make the error go away so you can continue developing locally. But it must be clear that you won't be able to run your code in production until the relevant permissions have been granted.
Stretch goal: whatever mechanism we use to achieve the above two points should ideally be robust against copy/pasting from existing projects. That is, we want to guard against people not realising that they don't have access to table X just because they copied code from another project which did have access to table X.
Stretchier goal: I'd like if possible to avoid doing anything too "magical" or implicit. By this I'm thinking of solutions things like having ehrQL locate and parse the project.yaml file and getting permissions out of there. I'd like, if possible, the application of permissions to be something you do inside the dataset definition file. We'd need to give some thought as to how you minimise repetition though in the case where you have many dataset definitions which all need the same permissions.
I've got some vague straw proposals of what a solution might look like here; but they need a bit more cogitation before I'm ready to commit them to ASCII.
The text was updated successfully, but these errors were encountered:
We need a way to enforce fine-grained permissions in ehrQL. The immediate use case is to control access to specific tables and replace the janky "regex the Python and create a Github issue" system that we've been limping along with. Another shortly upcoming use case is controlling access to event-level outputs while we figure out the necessary technical and social safeguards on using that appropriately. I'm sure we will end up discovering other needs for this as well.
This is going to involve touching several components of the system (job-server, job-runner and ehrQL) but as ehrQL is going to be the ultimate consumer of these permissions I thought it made sense to open the issue here.
At a sufficiently high, hand-wavy, level the production system has a fairly obvious design:
JobRequestsAPI
(exposed here) needs to grow a new field called something likepermissions
which encodes the relevant permissions for each JobRequest as an opaque blob of JSON.Job
object and pass it to any relevant jobs (just ehrQL jobs, or should we make it all jobs?) via an environment variable, sayOPENSAFELY_PERMISSIONS
.The harder design challenge is the local development experience for researchers. Here are some properties I think the system ought to have:
project.yaml
file and getting permissions out of there. I'd like, if possible, the application of permissions to be something you do inside the dataset definition file. We'd need to give some thought as to how you minimise repetition though in the case where you have many dataset definitions which all need the same permissions.I've got some vague straw proposals of what a solution might look like here; but they need a bit more cogitation before I'm ready to commit them to ASCII.
The text was updated successfully, but these errors were encountered: