-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(source-hardcoded-records): add new source #42434
Conversation
[skip ci] Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
[skip ci] Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember @cgardens told me we've used something similar when we did platform and DB sources performance testing.
Not opposed to having a Python implementation here, but maybe we have an existing one? @davinchia / @evantahler do you know?
Re: this PR: @artem1205 can you remove the boilerplate? Realistically, we are not about to change this connector often, so I'd say the boilerplate (mostly templates) in integration_tests, acceptance-test-config and such, unit_tests — they are all not needed.
@@ -0,0 +1,5 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete the boilerplate!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted
We have an E2E Testing source that generates fake records, but it's Java. We also have faker (python). Finally, @chandlerprall made a nodejs source recently that generates fake data too! |
from airbyte_cdk.sources.streams import Stream | ||
|
||
|
||
class HardcodedStream(Stream, ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see a stream with 24 million rows of exactly this schemas and values:
{ field1: "valuevaluevaluevaluevalue1",
field2: "valuevaluevaluevaluevalue1",
field3: "valuevaluevaluevaluevalue1",
field4: "valuevaluevaluevaluevalue1",
field5: "valuevaluevaluevaluevalue1"
}
this is the format we use for all of our other performance testing. doing the exact same thing helps make it comparable to other tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the format we use for all of our other performance testing. doing the exact same thing helps make it comparable to other tests
added new stream with this exact record.
We still need to leave another streams, because we want to test how python
/pydantic
works with large/nested data structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. thank you!
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock, use your best judgement. I strongly suggest that you delete the acceptance-test-config.yml
, integration_tests
, and schemas
if they are only used for tests.
Basically, this is a dev tool that is unlikely to change much.
What
Resovling https://github.com/airbytehq/airbyte-internal-issues/issues/8471
add new source with 3 streams of hardcoded records to track performance of Airbyte Python CDK
Follow up
How
records are hardcoded
Review guide
airbyte-integrations/connectors/source-hardcoded-records/source_hardcoded_records/streams.py
User Impact
Can this PR be safely reverted and rolled back?