-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues with AvailabilityStrategy and substream stream_slices #20891
Conversation
d7ec4bd
to
9cf4919
Compare
bad_branches_stream = Branches(repositories=[]) | ||
assert isinstance(branches_stream.availability_strategy, HttpAvailabilityStrategy) | ||
|
||
# If the stream_slices method returns {}, this fails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be removed before merging, it's just a note as to how I managed to reproduce the error while unit testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the docker/github-related changes are for testing and won't be merged, this seems reasonable.
Mainly have a question to make sure we're not unintentionally breaking a different edge case.
Also, can you clarify what about the thismonth.rocks
repo makes this fail? Is it because the commit_comment_reactions
is empty?
try: | ||
return next(slices) | ||
except StopIteration: | ||
return {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any insight into why previously we explicitly had this try/except? I don't fully understand what this was trying to do and if there's some edge case this was added for that should be directly tested via a new test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it actually makes more senes to
try:
return next(slices)
except StopIteration:
return None
Instead of removing it all together - right now it doesn't change the behavior either way since it's only used in the method above it, but I think this would be the actual behavior we'd want when using it elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's for the case that someone messed something up in what the stream slices returns, we wanted to address it, but we just addressed it incorrectly (empty slice instead of None)?
E.g. the case in my comment below where CommitCommentReactions' stream_slices
returns an empty generator
Yup, hence the draft - will clean up stuff that won't merge before turning it into an actual PR
The
Let me try to rewrite the test to better reflect that... The question therefore is, how is this logic working successfully during read but not during check_availability - I think it has do do with the stream slices there not returning the right kind of empty. We might handle this better when reading than when using The
So for
However we wouldn't expect to see any
This raises a However if you iterate ( Hypothetically this could be a problem on Github's end - maybe we expect the |
Pinterest behavior for
TL;DR I think PinterestSubstreams and github's ReactionStreams aren't handling well when the parent streams have no records - we tried to account for this case with a try/except but we didn't handle it quite right, by returning |
# bad_slice_return_value = {} | ||
# mocker.patch.object(RepoBasedStream, "stream_slices", return_value=bad_slice_return_value) | ||
# assert StreamHelper.get_stream_slice(branches_stream) == bad_slice_return_value | ||
# assert branches_stream.path(bad_slice_return_value) == f"repos/user/repo1/branches" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how changing bad_slice_return_value
from {}
to None
fixes this line. wouldn't it just raise a Type error (TypeError: 'NoneType' object is not subscriptable
)?
Able to reproduce the same error by trying to connect to a substream of an empty stream on master (so no AvailabilityStrategy) by using this
Looks like the issue was introduced during the implementation of check_stream, which functionality was reused to check streams for availability strategy. It was just never caught since no one tried to make a substream of a stream that is sometimes likely to be empty as a Going to fix that in the CDK now before putting AvailabilityStrategy back in. |
The edge case for substreams is being fixed separately in the CDK here (plus one bonus edge case!). Re-enabling the CDK will be done in a separate PR, so closing this one. |
To reproduce the KeyError issue in github locally:
Connect to the following Github repo (public, so any PAT should work)
Turn on the
commits
andcommit_comment_reactions streams
withFull refresh | Overwrite
. Note: you don't need thecommits
stream to reproduce, but I wanted to make sure that streams emitting records were still emitting records with the fix, and not just succeeding with 0 records.Run a sync on a working version -
0.3.11
(latest) or or0.3.8
(working version before broken version) should both work.Run a sync on
0.3.9
(broken version) - sync will fail with the following traceback:To test the potential fix locally:
Check out the branch, and add symlink to CDK:
Build the
source-github
image on the branch:Temporary changes make the docker build reference the local CDK (which is branched off of re-adding in the AvailabilityStrategy to the CDK). This contains no changes to the Github code (i.e. no implementation of an AvailabilityStrategy other than the default HttpAvailabilityStrategy) but with the potential fix in the CDK version. Run sync with this newly created
dev
image. Should succeed, with the same 12 records.To reproduce the issue in SAT
Create a copy of
secrets/config.json
intosecrets/test_config.json
. Update the contents as so:Now you should be able to run the SAT tests locally with the existing versions and
dev
version built with the CDK and get the same results as reproducing the issue above. E.g.0.3.8
should pass,0.3.9
should fail with the error, anddev
should pass again.Note that this line was added because it would fail on returning no records (it shouldn't return any - there are none in the repo) in the working versions.