Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change jdbc sources to discover more than standard schemas #1038

Merged
merged 7 commits into from
Nov 30, 2020

Conversation

ChristopheDuong
Copy link
Contributor

@ChristopheDuong ChristopheDuong commented Nov 20, 2020

What

In this thread from the #integration channel on slack:
https://airbytehq.slack.com/archives/C01AHCDHCKE/p1605795947045500

Sam Logan pointed out that we are not currently able to select tables from other schemas than the public one in the Postgres Source. Moreover, if possible, he would like to select multiple schemas at a time for that single postgres source.

How

In this PR, I was able to discover and list tables from multiples schemas.
Screenshot 2020-11-20 at 16 16 35

Unfortunately, I am hitting a road block as the usual syntax for such JDBC is to use "schema.tablename" which is not a permitted AIrbyteStream Name (as reported in #925)

This makes sense because then the destination wouldn't be able to create such a table name (for example in BigQuery)
Screenshot 2020-11-20 at 16 17 12

Alternative

  1. Another option that i have in mind is to produce a nested AirbyteCatalog with the top level being list of available schemas and the second level the list of tables within each Schema. However I think this wouldn't be handled by the UI yet...
  2. I am not sure if it is doable, but would it be possible to output an AirbyteStreamName of my_schema_name_my_table_name (usable name for destination) but add additional special fields in the stream to specify which parts are actually for the original schema and which are for table names? (ie my_schema_name & my_table_name) such that the source is still able to decode the stream name into proper 'my_schema_name.my_table_name'
  3. more ideas?

@cgardens
Copy link
Contributor

cgardens commented Nov 22, 2020

Discussed offline. The desired behavior is as follows:

  • pg source will emit stream names as <schema name>.<table name>
  • destinations will be in charge of converting that name into <schema name>_<table name> (or to whatever is appropriate for that destination).
  • all tables will still be output into the same schema.

@ChristopheDuong ChristopheDuong marked this pull request as ready for review November 23, 2020 14:15
@ChristopheDuong
Copy link
Contributor Author

Then this PR should be ready to be merged whenever (or we can wait until we made the proper changes in the destinations first)

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Couple questions around how this new configuration is going to be set and how it is used in discoverInternal. Once we clear that up we can move forward.

I am fine merging once have the above figured out or waiting for changes to the destinations. We already have sources that emit streams with . in them so it is probably fine to do this before we change destinations as they already somewhat handle this case. we're not really making anything worse.

@@ -2,6 +2,6 @@
"destinationDefinitionId": "8be1cf83-fde1-477f-a4ad-318d23c9f3c6",
"name": "Local CSV",
"dockerRepository": "airbyte/destination-csv",
"dockerImageTag": "0.1.3",
"dockerImageTag": "0.1.4",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to publish this destination along with the others in #1060 so acceptance tests that are using this are not able to complete without the proper version

Copy link
Contributor

@cgardens cgardens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question on implementation of getExcludedInternalSchemas. Once that is resolved, I will approve.

davydov-d added a commit that referenced this pull request Nov 23, 2022
davydov-d added a commit that referenced this pull request Nov 29, 2022
davydov-d added a commit that referenced this pull request Nov 30, 2022
* #1038 source freshdesk - fix schema

* #1038 source freshdesk: upd changelog

* #1038 source freshdesk - remove temp file

* #1038 source freshdesk: fix tests

* #1038 source freshdesk: update expected records again

* #1038 source freshdesk: rm unstable expected record

* #1038 source freshdesk: edit another expected record

* #1038 source freshdesk: edit another expected record

* #1038 source freshdesk: fix expected records

* #1038 source freshdesk: upd expected_records

* #1038 source freshdesk: resolve dependency conflict

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants