Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RN-400: Meditrak permissions based sync (server-side changes) #3992

Merged
merged 16 commits into from
Aug 12, 2022

Conversation

rohan-bes
Copy link
Collaborator

@rohan-bes rohan-bes commented Jun 23, 2022

Issue RN-400:

Server side changes for the new meditrak permissions based sync.

Sync logic:

  • The meditrak-app sends across a list of all previously synced countries and permission groups when syncing (none if first time sync)
  • The app then just combines these lists with the countries and permissions in the user's access policy
    • For countries and permission groups that have previously been synced, just sync new changes since last sync
    • For countries and permission groups that have never been synced, sync data for all time for those
    • Note: all models that we sync can be filtered by one or both of these permissions restrictions, see comments in PR for details
  • Once the sync is complete, the app then updates the list of synced countries and permission groups

This algorithm is hopefully sufficiently simple and flexible (can handle new user logins, as well as user permission changes) without syncing too much data, especially during the initial sync.

In order to filter the changes to sync by the countries and permission groups, I've created a new materialized view (permissions_based_meditrak_sync_queue) which the MeditrakSyncQueue manages.

I've tried to break this PR up into clean separate commits, if that's easier for the reviewer.

* }
* Responds to GET requests to the /changes/metadata endpoint
*/
export async function changesMetadata(req, res) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this endpoint returns more/different data to the /changes/count endpoint, I decided to create a new one with a new name. Also makes maintaining support for the old logic a little easier.

});
const changesCount = await models.meditrakSyncQueue.count(filter);

const { query: dbQuery } = await getChangesFilter(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-wrote this to just build up an actual SQL query rather than a query object. I find once the queries get beyond a certain level of complexity then it's just much easier to work with direct SQL. Still feels a bit odd passing in the select statement here though so I understand if reviewer wants to push back.

const changes = await query.executeOnDatabase(database);
const changesByRecordType = groupBy(changes, 'record_type');
const recordTypesToSync = Object.keys(changesByRecordType);
const columnNamesByRecordType = Object.fromEntries(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote this to do all the database stuff upfront in batch. Made server-side processing of changes much faster. Now the bottleneck is on the app side write to database.

export const supportsPermissionsBasedSync = version =>
semverCompare(version, PERMISSIONS_BASED_SYNC_MIN_APP_VERSION) >= 0;

export const getCountriesAndPermissionsToSync = async req => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where we do the little shimmy to work out which new countries and permission groups we need to sync. Will add more comments around this code explaining why

* This is used to improve the speed of querying the meditrak_sync_queue when
* doing a permissions based sync
*/
const query = new SqlQuery(`
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big ol' SQL query yay! Lemme know if you'd prefer me to abstract this into something more high level? I don't mind it as it stands cos it's more readable than it would be otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaaah 😱

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's really straightforward once you know what each line does and see how much repetition there is. I'd be keen to try it with some higher level functions pulled out, might give it a go while you're away if I get a chance. No need to though, agree this is pretty readable once you overcome the "shit that's dense" vibe

*/
if (isFeatureEnabled('MEDITRAK_SYNC_QUEUE')) {
const meditrakSyncQueue = new MeditrakSyncQueue(models);
await meditrakSyncQueue.createPermissionsBasedView(); // Ensure permissions based view has been setup
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always re-create on server startup? Simple: yes, efficient: maybe no... especially since this adds about 30 seconds to the server start time...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But good call to go simple to start with, and optimise later if we need to

exports.up = function (db) {
return db.runSql(`
DROP AGGREGATE IF EXISTS array_concat_agg(anyarray);
CREATE AGGREGATE array_concat_agg(anyarray) (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom aggregate function to help with the permissions based meditrak sync queue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool 😎

Copy link
Member

@edmofro edmofro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some great work, thanks heaps for this @rohan-bes! Our users are going to absolutely love it, and it fixes a real point of embarrassment during initial sync (i.e. peoples first impression of meditrak!)

I think you've gone for a good compromise in the design decision to not bother looking the country x permission group combination, but instead just going for the simpler but mostly-just-as-good check for if the user has the permission group in any country, and any permission group in the country.

exports.up = function (db) {
return db.runSql(`
DROP AGGREGATE IF EXISTS array_concat_agg(anyarray);
CREATE AGGREGATE array_concat_agg(anyarray) (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool 😎

const changes = await database.find(TYPES.MEDITRAK_SYNC_QUEUE, filter, {
sort: ['change_time'],
const msqColumns = await models.meditrakSyncQueue.fetchFieldNames();
const changeFilterFunction = supportsPermissionsBasedSync(appVersion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you think of a good verb to replace this with?
I don't always mind a function being named with a noun, but because "change" can also be a verb, I find this one particularly confusing!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might do a bit of renaming around this area. I now recognise that the nature of this function has actually changed (it used to just build up a filter object, but now it builds a query to fetch the changes).

But I'll take note to ensure whatever happens this line reads more intuitively 👍

? getPermissionsBasedChangesFilter
: getChangesFilter;
const { query } = await changeFilterFunction(req, {
select: msqColumns.join(', '),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
select: msqColumns.join(', '),
select: await models.meditrakSyncQueue.fetchFieldNames(),

Rather than a named variable here I find it easier to understand what's happening if it's all on one line

* This is used to improve the speed of querying the meditrak_sync_queue when
* doing a permissions based sync
*/
const query = new SqlQuery(`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaaah 😱

Comment on lines 28 to 29
JOIN survey s ON s.id = ss.survey_id
WHERE s.code = ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused here...why can't we use

Suggested change
JOIN survey s ON s.id = ss.survey_id
WHERE s.code = ?
WHERE ss.survey_id = ?

And provide this.id? Is this sometimes run before the survey has been committed to the db and therefore may not yet have an id generated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch! I'll fix this up

Comment on lines 176 to 194
if (sort) {
query = query.concat(`
ORDER BY ${sort}
`);
}

if (limit !== undefined) {
query = query.concat(`
LIMIT ?
`);
params.push(limit);
}

if (offset !== undefined) {
query = query.concat(`
OFFSET ?
`);
params.push(offset);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this stuff could be pulled out into a common DRY function above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do 👍

const typesWithoutPermissions = ['country', 'permission_group']; // Sync all countries and permission groups (needed for requesting access)
const permissionsClauses = [
{
query: `"type" = ? AND record_type IN ${SqlQuery.record(typesWithoutPermissions)}`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to separate these into two separate clauses, as it confused me initially thinking that we were only excluding deletes for the record types without permissions. I understand that is hard with the ORs between everything, so maybe we set up an initial query string outside of this first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can come up with something to address this 👍

`REFRESH MATERIALIZED VIEW CONCURRENTLY permissions_based_meditrak_sync_queue;`,
);
const end = Date.now();
winston.info(`permissions_based_meditrak_sync_queue refresh took: ${end - start}ms`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this generally take, out of interest?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check this, but I think I remember it taking a few seconds. Since it's a standard materialized view it basically has to fully rebuild each time, but the underlying query doesn't take too long.

If it were a fast refresh materialized view however... 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha nooo don't do it!!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so it takes 30 seconds to refresh, my bad. Makes sense as that's how long it takes to build the table initially. However, it's a concurrent refresh, which means the table can be safely queried while it's refreshing

...questionChanges,
...optionSetChanges,
...optionChanges,
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to run some stress testing on this - does it perform ok if e.g. all of the ssc's are being changed anyway (because they're deleted and created again) so they're also being added to the sync queue through the normal process? We don't hit too much locking if you do that with a bunch of surveys in a row?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! I had sort of assumed that this change handler would run after all the associated changes had resolved due to the debounce time, but it's definitely possible that it could kick off halfway which might make things awkward. I'll make a note in the test plan 👍

*/
if (isFeatureEnabled('MEDITRAK_SYNC_QUEUE')) {
const meditrakSyncQueue = new MeditrakSyncQueue(models);
await meditrakSyncQueue.createPermissionsBasedView(); // Ensure permissions based view has been setup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But good call to go simple to start with, and optimise later if we need to

@edmofro
Copy link
Member

edmofro commented Jun 29, 2022

Oh and one more general comment: I think this could do with more inline descriptions throughout. You've articulated the strategy really clearly in the PR description, and at a minimum I'd love to see that just copied into the top of one of the files.

@rohan-bes rohan-bes force-pushed the rn-400-permissions-based-sync-p1 branch from adf82df to 2e35a56 Compare July 27, 2022 23:45
@rohan-bes rohan-bes requested a review from edmofro July 28, 2022 06:48
const query = new SqlQuery(`
DROP MATERIALIZED VIEW IF EXISTS permissions_based_meditrak_sync_queue;
CREATE MATERIALIZED VIEW permissions_based_meditrak_sync_queue AS
SELECT msq.*,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edmofro bit of a change here in the latest commit. Before we had a separate field for each record type permissions, eg. (survey_countries, survey_permission_groups, survey_screen_countries, etc.)

This was pretty ugly and tedious, so I'm merging them into a single value using COALESCE here. Since there's just a single value per record type I think it should all be fine, and makes things a lot more readable!

meditrak_sync_queue and permissions_based_meditrak_sync_queue
…missions to

- Introduced /changes/metadata route to let app know which countries and
permissions groups are being synced
- Rewrote getChangesFilter to build an SqlQuery rather than using our ORM
- buildMeditrakSyncQuery and buildPermissionsBasedMeditrakSyncQuery
- This is closer to what it's new logic is
- Reworked permissions_based_meditrak_sync_queue to just have single 'coutry_ids' and 'permission_groups' columns
- Added comments explaining how permission based sync works
@rohan-bes rohan-bes force-pushed the rn-400-permissions-based-sync-p1 branch from 5eac074 to a9979fe Compare July 29, 2022 01:53
SFUNC = array_cat,
STYPE = anyarray
);
`);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edmofro there's actually a little start up problem when trying to deploy this migration.

In order to run this migration, the MeditrakSyncQueue must be running.
In order to create the MeditrakSyncQueue the database must have the array_concat_agg function.

I guess this highlights a genuine dependency cycle in our code logic, however I feel it'll be acceptable for us to just manually install this function in the prod database prior to releasing this code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I'm fine with that

Copy link
Member

@edmofro edmofro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looking great!

rohan-bes and others added 4 commits August 2, 2022 10:10
- countChanges wasn't using permissions based sync, so thought there were 0 changes
- Fixed bug where not all changes where synced when user permissions
changed
…t route

- Permissions based sync uses changes/metadata instead
@rohan-bes rohan-bes force-pushed the rn-400-permissions-based-sync-p1 branch from 02fb68b to 56df105 Compare August 9, 2022 05:43
…to deployment logic

- Adding 30 seconds to the central-server start time was proving too annoying
@rohan-bes rohan-bes merged commit 6b565d9 into dev Aug 12, 2022
@rohan-bes rohan-bes deleted the rn-400-permissions-based-sync-p1 branch August 12, 2022 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants