Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tap-google-sheets not syncing all columns in source sheet #449

Closed
henriblancke opened this issue Jun 18, 2020 · 6 comments
Closed

tap-google-sheets not syncing all columns in source sheet #449

henriblancke opened this issue Jun 18, 2020 · 6 comments
Labels
connector issue Issue is in the connector (tap/target) and not in PipelineWise enhancement New feature or request

Comments

@henriblancke
Copy link
Contributor

Subject of the issue

I'm trying to integrate tap-google-sheets into pipelinewise. Not all columns in my sheet are propagated to the target. The logs show the tap is identifying and selecting to correct range of cols in the sheet and when running the tap without pipelinewise with the standard csv or json target, the resulting file contains all the data in the sheet.

Your environment

  • pipelinewise==0.16.0
  • tap-google-sheets
  • pipelinewise-target-snowflake

Steps to reproduce

Follow the contribution guide on how to add a new tap and rebuild the docker image now that tap-google-sheets is included. Add a yaml config with client_id, client_secret, refresh_token and spreadsheet_id under db_conn. Add the sheet name as table_name you want to move into your target. Import the yaml and run the tap.

Expected behaviour

I'd expect all columns in the sheet to show up in snowflake.

Actual behaviour

The tap runs, but the csv that gets uploaded to S3 by pipelinewise doesn't contain all columns in the source sheet (3 in my case).

Further notes

Am I missing something vital for this tap to integrate with pipelinewise? Thank you for your help!

@henriblancke henriblancke changed the title tap-google-sheets not syncing all rows in source sheet tap-google-sheets not syncing all columns in source sheet Jun 18, 2020
@Samira-El
Copy link
Contributor

Hey, have you tried running the tap in a standalone mode to verify if the streams it generates are accurate?

@henriblancke
Copy link
Contributor Author

@Samira-El thanks for the response! Yep I did and things run as expected and are accurate.

@Samira-El
Copy link
Contributor

Samira-El commented Jul 1, 2020

Ok, then it sounds like the issue is in the target, because it's the one that parses the tap messages and creates csv file in S3.
Can you try feeding the tap streams to pipelinewise-target-snowflake and see if it processes every property in the SCHEMA type stream?

@louis-pie louis-pie added the bug Something isn't working label Aug 19, 2020
@henriblancke
Copy link
Contributor Author

@Samira-El I think the problem here might be that tap-google-sheets has the following as typedefs for schemas and pipelinewise-target-snowflake doesn't know how to handle that:

"anyOf": [
    {
        "type": "null"
    },
    {
        "type": "number"
    },
    {
        "type": "string"
    }
]

@Samira-El
Copy link
Contributor

Hey, glad to see that you've made progress on this issue!

Indeed, anyOf in stream's schema is currently not supported by ppw-target-snowflake.

There is an issue similar to this in the ppw-target-snowflake repo: transferwise/pipelinewise-target-snowflake#88

What do you think should be the best way to handle these typedefs?

@Samira-El Samira-El added connector issue Issue is in the connector (tap/target) and not in PipelineWise enhancement New feature or request and removed bug Something isn't working labels Jan 21, 2021
@Samira-El
Copy link
Contributor

Closing as this is not a bug in Pipelinewise, but rather target-snowflake lacking support for anyOf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
connector issue Issue is in the connector (tap/target) and not in PipelineWise enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants