Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Salesforce: increase CSV field_size_limit #10012

Merged
merged 11 commits into from
Feb 4, 2022

Conversation

grubberr
Copy link
Contributor

@grubberr grubberr commented Feb 2, 2022

Signed-off-by: Sergey Chvalyuk grubberr@gmail.com

What

Some users catch CSV exception airbytehq/oncall#115

_csv.Error: field larger than field limit (131072)

Increasing LIMIT for CSV field from default

131072 -> 9223372036854775807 (max value)

it has to eliminated this Exception.

We need to keep an eye on this solution because it can potentially consume too much RAM.

How

Describe the solution

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Feb 2, 2022
@grubberr grubberr temporarily deployed to more-secrets February 2, 2022 17:48 Inactive
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
@grubberr grubberr temporarily deployed to more-secrets February 2, 2022 17:51 Inactive
@codecov
Copy link

codecov bot commented Feb 2, 2022

Codecov Report

❗ No coverage uploaded for pull request base (master@b5b0976). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 88e88fa differs from pull request most recent head 85606c0. Consider uploading reports for the commit 85606c0 to get more accurate results

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #10012   +/-   ##
=========================================
  Coverage          ?   83.61%           
=========================================
  Files             ?        7           
  Lines             ?      476           
  Branches          ?        0           
=========================================
  Hits              ?      398           
  Misses            ?       78           
  Partials          ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b5b0976...85606c0. Read the comment docs.

@grubberr
Copy link
Contributor Author

grubberr commented Feb 2, 2022

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1785399117
✅ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1785399117
Python tests coverage:

	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        74      6    92%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              275    106    61%
	 source_acceptance_test/tests/test_full_refresh.py       52      2    96%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  70     17    76%
	 source_acceptance_test/utils/compare.py                 62     23    63%
	 source_acceptance_test/utils/connector_runner.py       110     48    56%
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  876    259    70%
	 Name                                 Stmts   Miss  Cover
	 --------------------------------------------------------
	 source_salesforce/__init__.py            2      0   100%
	 source_salesforce/api.py               126     32    75%
	 source_salesforce/exceptions.py          1      0   100%
	 source_salesforce/rate_limiting.py      22      6    73%
	 source_salesforce/source.py             75     34    55%
	 source_salesforce/streams.py           241    143    41%
	 source_salesforce/utils.py               8      7    12%
	 --------------------------------------------------------
	 TOTAL                                  475    222    53%
	 Name                                 Stmts   Miss  Cover
	 --------------------------------------------------------
	 source_salesforce/__init__.py            2      0   100%
	 source_salesforce/api.py               126     33    74%
	 source_salesforce/exceptions.py          1      0   100%
	 source_salesforce/rate_limiting.py      22      3    86%
	 source_salesforce/source.py             75     11    85%
	 source_salesforce/streams.py           241     31    87%
	 source_salesforce/utils.py               8      0   100%
	 --------------------------------------------------------
	 TOTAL                                  475     78    84%

@grubberr grubberr requested review from antixar and lazebnyi February 2, 2022 17:54
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 2, 2022 17:56 Inactive
Copy link
Contributor

@antixar antixar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add more comments about this csv limitation into the code.

Comment on lines 23 to 24
CSV_FIELD_SIZE_LIMIT = 1024 * 1024
csv.field_size_limit(CSV_FIELD_SIZE_LIMIT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set maximum available value here?
And I propose to print a log message about this limitation because this maximum value can be different for every OS type.

Comment on lines 557 to 563
data = [
{"Id": "1", "Name": '"first_name" "last_name"'},
{"Id": "2", "Name": "'" + 'first_name"\n' + "'" + 'last_name\n"'},
{"Id": "3", "Name": "first_name last_name"},
{"Id": "3", "Name": "first_name last_name" + (CSV_FIELD_SIZE_LIMIT - 100) * "e"},
{"Id": "4", "Name": "first_name last_name"},
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test where we can check that the function csv.field_size_limit raises this limit really

@grubberr grubberr self-assigned this Feb 3, 2022
@grubberr grubberr temporarily deployed to more-secrets February 3, 2022 13:35 Inactive
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
@grubberr
Copy link
Contributor Author

grubberr commented Feb 3, 2022

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1790537362
❌ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1790537362
🐛 https://gradle.com/s/xl25dc2m6ugto

@grubberr grubberr temporarily deployed to more-secrets February 3, 2022 16:22 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 3, 2022 16:25 Inactive
@grubberr grubberr requested a review from antixar February 3, 2022 16:26
@grubberr
Copy link
Contributor Author

grubberr commented Feb 3, 2022

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1791806255
❌ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1791806255
🐛 https://gradle.com/s/jxjx6bzg4qbm6

@grubberr grubberr temporarily deployed to more-secrets February 3, 2022 21:15 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 3, 2022 21:16 Inactive
@grubberr
Copy link
Contributor Author

grubberr commented Feb 4, 2022

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1793708010
❌ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1793708010
🐛 https://gradle.com/s/lxuxwhzswmwbw

@grubberr grubberr temporarily deployed to more-secrets February 4, 2022 07:17 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 4, 2022 07:18 Inactive
@grubberr
Copy link
Contributor Author

grubberr commented Feb 4, 2022

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1794546280
✅ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1794546280
Python tests coverage:

Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
source_acceptance_test/__init__.py                       2      0   100%
source_acceptance_test/base.py                          10      4    60%
source_acceptance_test/config.py                        74      6    92%
source_acceptance_test/tests/__init__.py                 4      0   100%
source_acceptance_test/tests/test_core.py              275    106    61%
source_acceptance_test/tests/test_full_refresh.py       52      2    96%
source_acceptance_test/tests/test_incremental.py        69     38    45%
source_acceptance_test/utils/__init__.py                 6      0   100%
source_acceptance_test/utils/asserts.py                 37      2    95%
source_acceptance_test/utils/common.py                  70     17    76%
source_acceptance_test/utils/compare.py                 62     23    63%
source_acceptance_test/utils/connector_runner.py       110     48    56%
source_acceptance_test/utils/json_schema_helper.py     105     13    88%
------------------------------------------------------------------------
TOTAL                                                  876    259    70%
Name                                 Stmts   Miss  Cover
--------------------------------------------------------
source_salesforce/__init__.py            2      0   100%
source_salesforce/api.py               126     32    75%
source_salesforce/exceptions.py          1      0   100%
source_salesforce/rate_limiting.py      22      6    73%
source_salesforce/source.py             75     34    55%
source_salesforce/streams.py           242    143    41%
source_salesforce/utils.py               8      7    12%
--------------------------------------------------------
TOTAL                                  476    222    53%
Name                                 Stmts   Miss  Cover
--------------------------------------------------------
source_salesforce/__init__.py            2      0   100%
source_salesforce/api.py               126     33    74%
source_salesforce/exceptions.py          1      0   100%
source_salesforce/rate_limiting.py      22      3    86%
source_salesforce/source.py             75     11    85%
source_salesforce/streams.py           242     31    87%
source_salesforce/utils.py               8      0   100%
--------------------------------------------------------
TOTAL                                  476     78    84%

@grubberr grubberr temporarily deployed to more-secrets February 4, 2022 10:59 Inactive
@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 4, 2022 11:00 Inactive
Copy link
Contributor

@antixar antixar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@grubberr
Copy link
Contributor Author

grubberr commented Feb 4, 2022

/publish connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1796319445
✅ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1796319445

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets February 4, 2022 17:42 Inactive
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
…:airbytehq/airbyte into grubberr/oncall-115-csv_field_size_limit
@grubberr grubberr merged commit 84d7323 into master Feb 4, 2022
@grubberr grubberr deleted the grubberr/oncall-115-csv_field_size_limit branch February 4, 2022 19:06
@grubberr grubberr temporarily deployed to more-secrets February 4, 2022 19:06 Inactive
@lazebnyi lazebnyi removed their request for review May 30, 2022 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants