-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MongoDb Source: Increase performance of discover #17614
MongoDb Source: Increase performance of discover #17614
Conversation
/test connector=connectors/source-mongodb-v2
Build PassedTest summary info:
|
@VitaliiMaltsev do you have a before and after comparison to document how much improvement this change is adding? |
@grishick parallelStream by default used number of processors -1 thread. Tested on my local env - with parallelStream discover is approximately 4x faster |
added screenshots with timing comparison |
LGTM, please rebase (or merge) and make sure tests still pass after rebase/merge |
# Conflicts: # airbyte-integrations/connectors/source-mongodb-v2/src/main/java/io.airbyte.integrations.source.mongodb/MongoDbSource.java
/test connector=connectors/source-mongodb-v2
Build PassedTest summary info:
|
NOTE
|
NOTE
|
NOTE
|
/publish connector=connectors/source-mongodb-v2
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-mongodb-strict-encrypt
if you have connectors that successfully published but failed definition generation, follow step 4 here |
…vation * master: (32 commits) fixed octavia position and z-index on onboarding page (#17708) Revert "Revert "Do not wait the end of a reset to return an update (#17591)" (#17640)" (#17669) source-google-analytics-v4: use hits metric for check (#17717) Source linkedin-ads: retry 429/5xx when refreshing access token (#17724) 🐛 Source Mixpanel: solve cursor field none expected array (#17699) 🎉 8890 Source MySql: Fix large table issue by fetch size (#17236) Test e2e testing tool commands (#17722) fixed escape character i18n error (#17706) Docs: adds missing " in transformations-with-airbyte.md (#17723) Change Osano token to new project (#17720) Source Github: improve 502 handling for `comments` stream (#17715) #17506 source snapchat marketing: retry failed request for refreshing access token (#17596) MongoDb Source: Increase performance of discover (#17614) Testing tool commands for run scenarios (#17550) Kustomize: Missing NORMALIZATION_JOB_* environment variables in stable-with-resource-limits overlays (#17713) Fix console errors (#17696) Revert: #17047 Airbyte CDK: Improve error for returning non-iterable from connectors parse_response (#17707) #17047 Airbyte CDK: Improve error for returning non-iterable from connectors parse_response (#17626) 📝 Postgres source: document occasional full refresh under cdc mode (#17705) Bump Airbyte version from 0.40.12 to 0.40.13 (#17682) ...
* MongoDb Source: Increase performance of discover * bump version * fixed tests * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
What
Performance of schema discover could be very slow for big datasets in Mongo db
How
Execute in parallel
Server response
Before
After
Recommended reading order
x.java
y.python
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.