Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination Azure Blob Storage: Added BufferedOutputStream to fix block count issue and improve performance #9190

Merged
merged 21 commits into from
Jan 12, 2022

Conversation

bmatticus
Copy link
Contributor

@bmatticus bmatticus commented Dec 29, 2021

What

Fix for issue 5980, azure blob storage destination crashing after 50,000 blocks.

How

Altered PrintWriter in both CSV/JSONL to disable autoflushing and wrapped the OutputStream for blob in BufferedOutputStream as recommended by Azure. Added a setting to the spec to make this adjustable, but it will default to 100MB for backwards compatibility. Also added title to account key that was missing in current version of the spec.

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

No impact expected, tested with existing connections on an instance to be sure it still worked as expected. There will be a slight memory footprint increase as the default buffer will be 100MB. This impact may vary with implementations and how many sources are being pulled at a time.

IntegrationTest Results

Execution optimizations have been disabled for 3 invalid unit(s) of work during this build to ensure correctness.
Please consult deprecation warnings for more details.

BUILD SUCCESSFUL in 7m 19s
85 actionable tasks: 5 executed, 80 up-to-date

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

@CLAassistant
Copy link

CLAassistant commented Dec 29, 2021

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Dec 29, 2021
@bmatticus bmatticus force-pushed the 5980-azure-blob-buffering branch from 1e266f0 to 88f0311 Compare December 29, 2021 16:08
@bmatticus bmatticus force-pushed the 5980-azure-blob-buffering branch from 88f0311 to ddcd67c Compare December 29, 2021 16:11
@marcosmarxm marcosmarxm requested a review from misteryeo January 3, 2022 19:55
Copy link
Contributor

@misteryeo misteryeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @bmatticus! Just added a few comments to clarify.

cc: @marcosmarxm

@alafanechere alafanechere requested a review from etsybaev January 4, 2022 17:51
@alafanechere alafanechere self-assigned this Jan 4, 2022
@alafanechere
Copy link
Contributor

Hi @bmatticus thanks for this improvement! @etsybaev can I ask you to review this please?

@alafanechere alafanechere mentioned this pull request Jan 4, 2022
@alafanechere alafanechere temporarily deployed to more-secrets January 4, 2022 18:00 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets January 4, 2022 18:02 Inactive
Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bmatticus could you please bump the connector version in these files too:

  • airbyte-config/init/src/main/resources/seed/destination_definitions.yaml
  • airbyte-config/init/src/main/resources/config/STANDARD_DESTINATION_DEFINITION/b4c5d105-31fd-4817-96b6-cb923bfc04cb.json

@alafanechere
Copy link
Contributor

I confirm acceptance tests are passing 👏

@etsybaev
Copy link
Contributor

etsybaev commented Jan 10, 2022

/test connector=connectors/destination-azure-blob-storage

🕑 connectors/destination-azure-blob-storage https://github.com/airbytehq/airbyte/actions/runs/1677897290
❌ connectors/destination-azure-blob-storage https://github.com/airbytehq/airbyte/actions/runs/1677897290
🐛

🕑 connectors/destination-azure-blob-storage https://github.com/airbytehq/airbyte/actions/runs/1677897290
❌ connectors/destination-azure-blob-storage https://github.com/airbytehq/airbyte/actions/runs/1677897290
🐛

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets January 10, 2022 14:41 Inactive
@etsybaev
Copy link
Contributor

I see that integration tests fail by some reason
Selection_367
n

@bmatticus
Copy link
Contributor Author

I see that integration tests fail by some reason
Selection_367
n

Odd, I did run an integration test local after latest changes and it was successful. I think I missed saving the defaults file in the IDE on one commit before this one ran, which may have caused a rollover in the INT for this run though.

@etsybaev
Copy link
Contributor

I thought that the issue may be caused due to the fact that we try to merge from the forked branch. So created the same branch, but in Airbyte repo and ran tests one more time.
Also failed. Seeing some NPE exceptions
Link for another PR
#9393

Link to tests run:
https://github.com/airbytehq/airbyte/runs/4765425170?check_suite_focus=true

@bmatticus
Copy link
Contributor Author

I thought that the issue may be caused due to the fact that we try to merge from the forked branch. So created the same branch, but in Airbyte repo and ran tests one more time. Also failed. Seeing some NPE exceptions Link for another PR #9393

Link to tests run: https://github.com/airbytehq/airbyte/runs/4765425170?check_suite_focus=true

Yea looking at the branch there it has the old default still. I'll double check but I did a fresh pull on my branch and tested it just a bit ago and it passed. I assume its a rollover here though still. The NPE appear to be associated with passed tests, I can only assume they may be expected without digging in further. They appear to be testing invalid keys/etc so it may be an expected behavior there.

@bmatticus
Copy link
Contributor Author

image

That top one is the one that I believe fixes the issue in the test now. It does pull down off my branch on my fork. Maybe some inconsistencies now with the local branch being there?

@bmatticus
Copy link
Contributor Author

Actually it appears the test cases may be invalid for those, it appears some of them have had a default storage config added for tests and these have not. This is not new but I'm happy to update them here.

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pulled the latest code + merge with master on my side branch (#9289). I confirm acceptance tests are passing. @etsybaev let me know if @bmatticus changes look good to you and I'll publish the connector and merge.

@etsybaev etsybaev self-requested a review January 11, 2022 15:17
Copy link
Contributor

@etsybaev etsybaev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a static code check
@alafanechere approved, thanks

@alafanechere
Copy link
Contributor

@misteryeo / @sherifnada, do you mind having a final review as this PR changes this connector's spec. On my side, I confirm acceptance tests are passing, and the image was published, ready to merge.

@alafanechere alafanechere temporarily deployed to more-secrets January 11, 2022 18:38 Inactive
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval specifically for the spec.json modifications, didn't look at the code

…/src/main/resources/spec.json

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
@alafanechere alafanechere merged commit 80666cf into airbytehq:master Jan 12, 2022
@alafanechere
Copy link
Contributor

Thank you for your contribution @bmatticus, I published the connectors, it's now ready to use (airbyte/destination-azure-blob-storage:0.1.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

10 participants