Design and implement a permissions system for ehrQL #2199

evansd · 2024-11-01T12:38:38Z

We need a way to enforce fine-grained permissions in ehrQL. The immediate use case is to control access to specific tables and replace the janky "regex the Python and create a Github issue" system that we've been limping along with. Another shortly upcoming use case is controlling access to event-level outputs while we figure out the necessary technical and social safeguards on using that appropriately. I'm sure we will end up discovering other needs for this as well.

This is going to involve touching several components of the system (job-server, job-runner and ehrQL) but as ehrQL is going to be the ultimate consumer of these permissions I thought it made sense to open the issue here.

At a sufficiently high, hand-wavy, level the production system has a fairly obvious design:

Job Server needs to have a list of ehrQL permissions that can be attached to a workspace/project/$APPROPRIATE_ENTITY.
The JobRequestsAPI (exposed here) needs to grow a new field called something like permissions which encodes the relevant permissions for each JobRequest as an opaque blob of JSON.
Job Runner needs to store this field on the Job object and pass it to any relevant jobs (just ehrQL jobs, or should we make it all jobs?) via an environment variable, say OPENSAFELY_PERMISSIONS.
ehrQL needs to read this environment variable, decode it, and throw an error if the dataset definition is trying to do anything it shouldn't.

The harder design challenge is the local development experience for researchers. Here are some properties I think the system ought to have:

It should be clear in the documentation which tables/features require special permission to use, why they require special permission, and how to go about asking for it.
Our tests or our docs build system should make it impossible to restrict a table or feature without also adding the required documentation.
Attempting to use a restricted table/feature should, by default, immediately throw a clear error.
If you do have permission to use the table/feature it should be obvious what you need to do to make this error go away.
If you do not (yet) have permission to use the table/feature it should also be obvious how you make the error go away so you can continue developing locally. But it must be clear that you won't be able to run your code in production until the relevant permissions have been granted.
Stretch goal: whatever mechanism we use to achieve the above two points should ideally be robust against copy/pasting from existing projects. That is, we want to guard against people not realising that they don't have access to table X just because they copied code from another project which did have access to table X.
Stretchier goal: I'd like if possible to avoid doing anything too "magical" or implicit. By this I'm thinking of solutions things like having ehrQL locate and parse the project.yaml file and getting permissions out of there. I'd like, if possible, the application of permissions to be something you do inside the dataset definition file. We'd need to give some thought as to how you minimise repetition though in the case where you have many dataset definitions which all need the same permissions.

I've got some vague straw proposals of what a solution might look like here; but they need a bit more cogitation before I'm ready to commit them to ASCII.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design and implement a permissions system for ehrQL #2199

Design and implement a permissions system for ehrQL #2199

evansd commented Nov 1, 2024

Design and implement a permissions system for ehrQL #2199

Design and implement a permissions system for ehrQL #2199

Comments

evansd commented Nov 1, 2024