-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Source File - add support for custom encoding #15293
Conversation
/test connector=connectors/source-file
Build FailedTest summary info:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested changes, but not critical.
I guess the unit_test should be moved into integration_tests
dir as well as the path the the test file should be the absolute path, not relative.
HERE = Path(__file__).parent.absolute()
is the option to use here.
self._url = url | ||
self._provider = provider | ||
self._file = None | ||
self.args = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: self._reader_args
/test connector=connectors/source-file
Build PassedTest summary info:
|
# Conflicts: # airbyte-integrations/connectors/source-file/source_file/client.py
/test connector=connectors/source-file
Build PassedTest summary info:
|
/publish connector=connectors/source-file
if you have connectors that successfully published but failed definition generation, follow step 4 here |
…ing' into midavadim/7916-source-file-encoding
commit 10fb1dc137175d09826cdfcf419698e3000cd418 Author: pmossman <parker@airbyte.io> Date: Wed Aug 10 13:37:00 2022 -0700 format and pmd commit 7c223ec2e0abfec864c11395d7310d679706a49c Author: pmossman <parker@airbyte.io> Date: Wed Aug 10 13:08:09 2022 -0700 update peristStateActivity test commit 763e9e2c5ca5f998ab49ffd0dcafb7ae81201b2b Author: pmossman <parker@airbyte.io> Date: Fri Aug 5 15:24:10 2022 -0700 format commit c176e63c3841a1f08c7a43359d293b12297e03e4 Author: pmossman <parker@airbyte.io> Date: Fri Aug 5 15:18:03 2022 -0700 move converters to module that worker can access, convert statePersistence calls to API calls, convert statePersistence helper to local private method commit 1b583487b4ea7dd058944cdbce4de6197f967523 Author: pmossman <parker@airbyte.io> Date: Fri Aug 5 10:37:00 2022 -0700 add createOrUpdateState API endpoint commit d87eed6215ce451a3e126d433991967317839876 Author: pmossman <parker@airbyte.io> Date: Fri Aug 5 13:42:16 2022 -0700 add AirbyteApiClient to WorkerApp for data plane workers to use commit a65524a Author: Teal Larson <LARSON.TEAL@GMAIL.COM> Date: Wed Aug 10 16:03:59 2022 -0400 🪟 🔧 Add testing and storybook component for CatalogDiffModal (#15426) * wip diff modal test setup * starting storybook add * storybook working now * cleanup * aria labels * test syncmode string commit 2f17e99 Author: Liren Tu <tuliren.git@outlook.com> Date: Wed Aug 10 13:02:01 2022 -0700 🐞 Postgres source: fix bug in intermediate state emission (#15496) * Rename record counter * Rename method * Emit intermediate state after all cursor records * Emit intermediate state only when it is ready * Merge two checks * Add a testing message * Fix unit tests * Add one more testing record and add comments * Add test case for multiple records with the same cursor value * Revert irrelevant change * Add explanation in javadoc * Format code * Rename testing methods * Fix comment * Bump version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> commit f540499 Author: Alexandre Girard <alexandre@airbyte.io> Date: Wed Aug 10 11:37:07 2022 -0700 [low-code connectors]: Assert there are no custom top-level fields (#15489) * move components to definitions field * Also update the references * validate the top level fields and add version * raise exception on unknown fields * newline * unit tests * set version to 0.1.0 * newline commit f52bfb6 Author: Xiaohan Song <xiaohan@airbyte.io> Date: Wed Aug 10 11:16:17 2022 -0700 change query frequency to 1hour (#15499) commit f143c8f Author: midavadim <midavadim@yahoo.com> Date: Wed Aug 10 21:13:51 2022 +0300 :tada: Source File - add support for custom encoding (#15293) * added support for custom encoding * fixed unit test for utf16 * updated docs * bumped connector version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> commit bbf3584 Author: Alexandre Girard <alexandre@airbyte.io> Date: Wed Aug 10 10:58:22 2022 -0700 Remove unused field from JsonSchema (#15425) * few fixes from working with sendgrid * reset to master * only update the docstring * reset commit a280113 Author: VitaliiMaltsev <39538064+VitaliiMaltsev@users.noreply.github.com> Date: Wed Aug 10 20:44:51 2022 +0300 Destination S3: add LZO compression support (#15394) * Fixed bucket naming for S3 * Destination S3: add LZO compression support for parquet files * Destination S3: add LZO compression support for parquet files * implemented logic for aarch64 * removed redundant logging * updated changelog * moved intstall of native-lzo lib to Dockerfile * removed redundant logging * add unit test for aarch64 * bump version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> commit 29c3426 Author: sivankumar86 <sivankumar86@users.noreply.github.com> Date: Thu Aug 11 02:34:47 2022 +1000 Source MSSQL: special character support in dbname for CDC method (#15430) * information schema included * special character handle * Revert "information schema included" This reverts commit f0aee6a. * version change * doc update * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> commit 959d862 Author: Baz <oleksandr.bazarnov@globallogic.com> Date: Wed Aug 10 19:20:22 2022 +0300 🐛 Source SalesForce: changed `DEFAULT_WAIT_TIMEOUT_SECONDS` to 24-hour limit (#15444) commit 0092712 Author: Subodh Kant Chaturvedi <subodh1810@gmail.com> Date: Wed Aug 10 21:48:57 2022 +0530 fix postgres data handling from WAL logs in CDC mode (#15481) * fix postgres data handling from WAL logs in CDC mode * format * use formatter for dates also (#15485) * format * change test structure * change log to debug Co-authored-by: Edward Gao <edward.gao@airbyte.io> commit fdb5eb9 Author: Evan Tahler <evan@airbyte.io> Date: Wed Aug 10 09:03:02 2022 -0700 Simplify the `MigrationAcceptanceTest` (#15497) * disable `testAutomaticMigration` * empty commit to retry tests * Simplify the MigrationAcceptanceTest * lint * Fix PMD. Reorder some calls to make clear what is happening. Co-authored-by: Davin Chia <davinchia@gmail.com> commit fd70913 Author: Augustin <augustin.lafanechere@gmail.com> Date: Wed Aug 10 17:42:07 2022 +0200 SAT: compatibility tests for catalogs (#15486) commit 1ad5152 Author: Evan Tahler <evan@airbyte.io> Date: Wed Aug 10 08:21:52 2022 -0700 Disable automaticMigrationAcceptanceTest (#15492) * disable `testAutomaticMigration` * empty commit to retry tests commit 1228451 Author: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> Date: Wed Aug 10 11:11:45 2022 -0400 Fix styles impacting global ul, li in FieldSection component (#15484) commit 6aa08e0 Author: Jonathan Pearlin <jonathan@airbyte.io> Date: Wed Aug 10 10:55:46 2022 -0400 Add micronaut dependencies and bundles (#15459) * Add micronaut dependencies and bundles * Update Micronaut core commit 7662956 Author: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> Date: Wed Aug 10 10:51:43 2022 -0400 🪟 🧹 Cleanup documentation panel components (#15455) * Add docs/ to frontend workspace * Migrate Markdown components to scss and cleanup when not found is rendered * Add white-space: break-spaces rule to markdown code blocks commit 1258ab4 Author: Topher Lubaway <asimplechris@gmail.com> Date: Wed Aug 10 09:05:14 2022 -0500 Revert "Adds PAT check to shared pr check (#15453)" (#15511) This reverts commit 06a18d4. commit 853b88a Author: Kyryl Skobylko <xpuska513@gmail.com> Date: Wed Aug 10 16:48:20 2022 +0300 fix: fix gcs-log creds secret name, add externaldb configuration for temporal, fix webapp ingress (#15510) commit c782303 Author: Yatsuk Bogdan <yatsukbogdan@gmail.com> Date: Wed Aug 10 15:57:26 2022 +0300 :window: :art: Increases GroupTitle followed divs width from 180px to 250px (#13956) * Increases GroupControls followed divs width from 180px to 250px * Increases min-width for GroupTitle * Change layout to flexbox Co-authored-by: Tim Roes <tim@airbyte.io> commit e28bc3a Author: Serhii Chvaliuk <grubberr@gmail.com> Date: Wed Aug 10 13:55:29 2022 +0300 🎉Source Harvest: Added `parent_id` for all streams which have parent stream (#15221) Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com> commit aaa3aae Author: Tuhai Maksym <kimerinn@gmail.com> Date: Wed Aug 10 12:43:55 2022 +0300 15310: Destination Scylla: Handle per-stream state (#15399) * 15310: Destination Scylla: Handle per-stream state * 15399: test fix * 15318: test fix * 15318: updating version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> commit c724630 Author: Yurii Bidiuk <35812734+yurii-bidiuk@users.noreply.github.com> Date: Wed Aug 10 10:17:23 2022 +0300 Add test case for new fields appearing in data (#15372) * add test case for new field(s) appearing in data * rework test to verify that sync at least not failed if new fields are present commit 6e1a76f Author: Serhii Chvaliuk <grubberr@gmail.com> Date: Wed Aug 10 09:24:40 2022 +0300 🐛 Source Amazon Ads: define primary_key for all report streams (#15469) Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com> commit c1a0cbc Author: Octavia Squidington III <90398440+octavia-squidington-iii@users.noreply.github.com> Date: Wed Aug 10 04:20:12 2022 +0200 Bump Airbyte version from 0.39.42-alpha to 0.40.0-alpha (#15493) Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com> commit f6766ee Author: Benoit Moriceau <benoit@airbyte.io> Date: Wed Aug 10 07:50:41 2022 +0800 Revert "Revert "Release per stream to the OSS project (#15008)" (#15177)" (#15401) This reverts commit 362fc4e. commit eab0013 Author: Edward Gao <edward.gao@airbyte.io> Date: Tue Aug 9 16:13:09 2022 -0700 🐛 Source snowflake: int columns should be discovered as ints (#15314) * snowflake discovers ints as ints * version bump+changelog * bump version+changelog * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> commit f506c60 Author: Anne <102554163+alovew@users.noreply.github.com> Date: Tue Aug 9 16:07:35 2022 -0700 Track number of streams in syncs (#15478) * Add number_of_streams to job sync tracking commit 6c5d1ff Author: Augustin <augustin.lafanechere@gmail.com> Date: Wed Aug 10 00:33:58 2022 +0200 SAT: measure unit test coverage (#15443) commit e9afa9b Author: Anne <102554163+alovew@users.noreply.github.com> Date: Tue Aug 9 15:30:48 2022 -0700 Error Prone PMD rules (#15010) * Implement ErrorProne PMD rules: AssignmentInOperand AvoidAccessibilityAlteration AvoidBranchingStatementAsLastInLoop AvoidCatchingNPE AvoidCatchingThrowable AvoidDuplicateLiterals rule commit c536e51 Author: Tim Roes <tim@airbyte.io> Date: Wed Aug 10 00:11:12 2022 +0200 Fix copy link to logs functionality (#15368) * Fix copy link to logs functionality * Update airbyte-webapp/src/components/JobItem/JobItem.tsx Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> * Fix scrolling * Remove smooth scrolling * Improve effect for better return statements * Better scroll Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> commit 62303a8 Author: Augustin <augustin.lafanechere@gmail.com> Date: Tue Aug 9 23:07:13 2022 +0200 SAT: check that previous config schema validates against current connector spec (#15367) commit 123705c Author: Stephen Wentling <stephen@swentling.com> Date: Tue Aug 9 21:30:14 2022 +0100 Source Jira: Added updates to include issue components and fixes to README files (#15135) * solve readme conflict * updated jira sources with open PR details * correct additionalProperties test discover Co-authored-by: marcosmarxm <marcosmarxm@gmail.com> commit 9e691d8 Author: Alex <109167606+alex-gron@users.noreply.github.com> Date: Tue Aug 9 14:28:38 2022 -0500 fix broken link (#15379) commit 36ed6ce Author: Denys Davydov <davydov.den18@gmail.com> Date: Tue Aug 9 21:58:52 2022 +0300 #15445 source typeform: integration tests (#15446) commit 06a18d4 Author: Topher Lubaway <asimplechris@gmail.com> Date: Tue Aug 9 13:33:20 2022 -0500 Adds PAT check to shared pr check (#15453) * Adds PAT check to shared pr check * Name change * Removes "safe_to_push" string * Adds OCTAVIA_PAT and uses the found PAT found PAT was not used in all locales, so this could have still failed on an expired OCTAVIA_PAT before this change
* added support for custom encoding * fixed unit test for utf16 * updated docs * bumped connector version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
* greenhouse minus pagination * jobs * first substream * rename field * applications_demographics_answers_stream * interviews * All streams are implemented * fix check * fix spec * disable backward compatibility tests * disable backward compatibility tests * unit tests * definitions * only use config.json * bump version * expected records * delete stream classes * Handle extracting no records from root * handle missing keys * Remove unused field from JsonSchema (#15425) * few fixes from working with sendgrid * reset to master * only update the docstring * reset * 🎉 Source File - add support for custom encoding (#15293) * added support for custom encoding * fixed unit test for utf16 * updated docs * bumped connector version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> * change query frequency to 1hour (#15499) * [low-code connectors]: Assert there are no custom top-level fields (#15489) * move components to definitions field * Also update the references * validate the top level fields and add version * raise exception on unknown fields * newline * unit tests * set version to 0.1.0 * newline * 🐞 Postgres source: fix bug in intermediate state emission (#15496) * Rename record counter * Rename method * Emit intermediate state after all cursor records * Emit intermediate state only when it is ready * Merge two checks * Add a testing message * Fix unit tests * Add one more testing record and add comments * Add test case for multiple records with the same cursor value * Revert irrelevant change * Add explanation in javadoc * Format code * Rename testing methods * Fix comment * Bump version * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> * 🪟 🔧 Add testing and storybook component for CatalogDiffModal (#15426) * wip diff modal test setup * starting storybook add * storybook working now * cleanup * aria labels * test syncmode string * 🎉 New Source: Hubplanner (#15521) * added hubplanner source connector * feat: added more streams to read from * cleaned up some unneeded integration tests * fix hubplanner schema * changes * update dockerfile * add seed and doc * update spec * run source spec seed file Co-authored-by: Ricky Renner <renner@amendllc.com> * Make it possible to specify normalization pod resources. (#15495) Today we are running into OOM exceptions with normalization. Normalization itself also inherits the destination's resource requirements. After work to bring destination memory usage down, this is no longer ideal, since most destinations use less memory than normalization needs. This PR makes it possible to specify the general resource the normalization pod is provided via env vars. Notes: - Add env vars. Default to the various job main container resources if these are not set. - Instead of using the destination's memory, use the normalization specify env vars. * [low-code connectors] Extract datetime parser and handle %s format directive (#15429) * fix parse * Revert "fix parse" This reverts commit 3c76c5a. * fix parse timestamp * extract datetime parser * remove print * use parser * top level docstring * rename variable * do not use timestamp() * Revert "do not use timestamp()" This reverts commit 016cb69. * update comment * bump cdk version * Update template * source-file-secure bump to 0.2.16 (#15528) * update Dockerfile version * update init to accept additional args * unit test sendgrid messages stream (#15331) * unit test sendgrid messages stream * reset * Update airbyte-integrations/connectors/source-sendgrid/unit_tests/unit_test.py Co-authored-by: Augustin <augustin.lafanechere@gmail.com> Co-authored-by: Augustin <augustin.lafanechere@gmail.com> * record extractor interface * dpath extractor * docstring * 🎉 Source File: cache binary stream to file (#15501) Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com> * Docs: update posthog.md (#15541) * Source Stripe: implement slicing (#15292) * #45 oncall - source Stripe: implement slicing * #45 source stripe: upd changelog * #45 source stripe: upd changelog * #45 source stripe: make slice range configurable * #45 source stripe: move generating a single slice into a mixin * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> * fix: revert extraEnv delition in values.yaml for bootloader (#15548) * fix: revert extraEnv delition in values.yaml for bootloader * add newline * SAT: backward compatibility - check that cursor fields were not changed (#15520) * Replace twttr repo to the root build.gradle (#15544) * Fixed bucket naming for S3 * replaced twttr repo to the root build.gradle * replaced twttr repo to the root build.gradle Co-authored-by: Oleksandr Sheheda <alexandr-shegeda@users.noreply.github.com> * Generate separate server endpoints per domain (#15513) * 🐛 Backward compatibility test: Don't fail on updating additionalProperties (#15532) * Source Recurly: adds `state_checkpoint_interval` to streams (#13685) * Add `state_checkpoint_interval` to Recurly stream * Bumpg Recurly source version to `0.4.1` * reset * use dpath * enable backward compatibility test * infer types * Revert "infer types" This reverts commit b4de8d6. * infer some of the types * some drying * more drying * auto-bump connector version [ci skip] Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com> Co-authored-by: midavadim <midavadim@yahoo.com> Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com> Co-authored-by: Xiaohan Song <xiaohan@airbyte.io> Co-authored-by: Liren Tu <tuliren.git@outlook.com> Co-authored-by: Teal Larson <LARSON.TEAL@GMAIL.COM> Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com> Co-authored-by: Ricky Renner <renner@amendllc.com> Co-authored-by: Davin Chia <davinchia@gmail.com> Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> Co-authored-by: Augustin <augustin.lafanechere@gmail.com> Co-authored-by: Serhii Chvaliuk <grubberr@gmail.com> Co-authored-by: juliatournant <39640564+juliatournant@users.noreply.github.com> Co-authored-by: Denys Davydov <davydov.den18@gmail.com> Co-authored-by: Kyryl Skobylko <xpuska513@gmail.com> Co-authored-by: VitaliiMaltsev <39538064+VitaliiMaltsev@users.noreply.github.com> Co-authored-by: Oleksandr Sheheda <alexandr-shegeda@users.noreply.github.com> Co-authored-by: Jonathan Pearlin <jonathan@airbyte.io> Co-authored-by: Mohamed Magdy <mohamed.magdy@canary.is>
What
encoding should be passed to smartopen library as well.
How
Describe the solution
Recommended reading order
x.java
y.python
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereUpdating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereConnector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changesTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.