-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: Implement GitHub integration sync #1368
Comments
A thought on the queue: perhaps we could spawn a task every time a operation record is enqueued or processed. The task looks for operation records to process and exits if none are currently able to be processed. We would want to process every operation that does not have a operation for the same record being processed currently. By way of example, assume the following operations:
If we were to spawn a task now, we would get the following:
Assume
Later still,
And so on. The queue is therefore asynchronous across resources but synchronous per resource. |
Tightly grouped events happen fairly frequently, particularly when issues/pull requests are labeled, assigned, etc at time of creation: The image above shows multiple "simultaneous" events occurring on the same pull request. We'll likely need to consider some type of ordering-system that is not timestamp-based solely, but some combination of the timestamp, event, and data that changed. In the short-term, it can probably be resolved by dropping those events and simply using the fetch of the remote data to make our changes. |
Might I suggest a quick name change here: instead of |
|
I've updated the bits above to use |
Problem
We need to reliably and concurrently sync tasks and comments to and from GitHub in a non-blocking way.
By providing a timestamp-based concurrency control system we can use a known algorithm to make our GitHub integration more robust.
More importantly, we will be able to unblock our other objectives. We cannot proceed with onboarding projects or volunteers unless GitHub sync is stable, since our overall strategy depends on us connecting volunteers to tasks.
Tasks
In scope
TaskSyncOperation
modelCommentSyncOperation
modelTaskSyncOperation
when issue webhook is receivedTaskSyncOperation
when pull request webhook is receivedTaskSyncOperation
when the task is created/updated from the clientCommentSyncOperation
when issue comment webhook is receivedCommentSyncOperation
when the comment is created/updated from the clientOut of scope
Outline
We would have a sync operation for each type of internal record we want to sync. For example:
TaskSyncOperation
CommentSyncOperation
Every sync operation record, regardless of type, would have a:
direction
-:inbound | :outbound
github_app_installation_id
- theid
of the app installation for this syncgithub_updated_at
- the last updated at timestamp for the resource on GitHubcanceled_by
- theid
of theSyncOperation
that canceled this oneduplicate_of
- theid
of theSyncOperation
that this is a duplicate ofdropped_for
- theid
of theSyncOperation
that this was dropped in favor ofstate
:queued
- waiting to be processed:processing
- currently being processed; limited to one per instance of the synced record, e.g.comment_id
:completed
- successfully synced:errored
- should be paired with a reason for the error:canceled
- another operation supersedes this one, so we should not process it:dropped
- this operation was outdated and was dropped:duplicate
- another operation already existed that matched the timestamp for this one:disabled
- we received the operation but cannot sync it because the repo no longer syncs to a projectThen each type would have type-specific fields, e.g. a
CommentSyncOperation
would have:comment_id
- theid
of ourcomment
recordgithub_comment_id
- theid
of our cached record for the external resourcegithub_comment_external_id
- theid
of the resource from the external provider (GitHub)If the event is due to the resource being created, there will not be a conflict. If the resource was created from our own clients, then there is no external GitHub ID yet. The same is true of events coming in from external providers and there is no internal record yet. I'm not yet clear as to whether we should conduct any conflict checking on these event types, but my guess is no. It should likely jump straight to
:processing
.When an event comes in from GitHub we should (using a
github_comment
as our example):comment_sync_operations
)github_comment_external_id
where:github_updated_at
is after our operation's last updated timestamp (limit 1):dropped
and stop processing, setdropped_for
to theid
of the operation in thelimit 1
querygithub_updated_at
timestamp for the relevantrecord_
is equal to our operation's last updated timestamp (limit 1):duplicate
and stop processing, setduplicate_of
to theid
of the operation in thelimit 1
modified_at
timestamp for the relevantrecord_
is after our operation's last updated timestamp:dropped
and stop processing, setdropped_for
to theid
of the operation in thelimit 1
queryintegration_external_id
where:github_updated_at
is before our operation's last updated timestamp:canceled
and setcanceled_by
to theid
of this event:queued
operation or:processing
operation for theintegration_external_id
:queued
:processing
, check again to see if we can proceed, then create or update thecomment
through the relationship on the record forcomment_id
:completed
, kick off process to look for next:queued
item where thegithub_updated_at
timestamp is the oldestWe would also need within the logic for updating the given record to check whether the record's updated timestamp is after the operation's timestamp. If it is, then we need to bubble the changeset validation error and mark the operation as
:dropped
per the above.Some upsides of the approaches above that I wanted to document, in no particular order:
%Comment{id: 1}
and%Comment{id: 2}
without any conflict.The text was updated successfully, but these errors were encountered: