Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

normalization: handle records > 1MB for redshift SUPER type #14573

Closed
alafanechere opened this issue Jul 11, 2022 · 13 comments
Closed

normalization: handle records > 1MB for redshift SUPER type #14573

alafanechere opened this issue Jul 11, 2022 · 13 comments
Labels

Comments

@alafanechere
Copy link
Contributor

alafanechere commented Jul 11, 2022

Tell us about the problem you're trying to solve

Redshift's normalization generates SUPER object whose size exceeds the limit of 1MB:

022-07-06 06:17:18 e[42mnormalizatione[0m > 06:17:18      error:  Invalid input
2022-07-06 06:17:18 e[42mnormalizatione[0m > 06:17:18      code:      8001
2022-07-06 06:17:18 e[42mnormalizatione[0m > 06:17:18      context:   SUPER value exceeds export size.
2022-07-06 06:17:18 e[42mnormalizatione[0m > 06:17:18      query:     7075389
2022-07-06 06:17:18 e[42mnormalizatione[0m > 06:17:18      location:  partiql_export.cpp:9
2022-07-06 06:17:18 e[42mnormalizatione[0m > 06:17:18      process:   query0_91_7075389 [pid=32005]

Describe the solution you’d like

Normalization should explicitly drop records > 1MB or restructure these records to make them lower than 1MB.

Seeing a similar issue with on-call airbytehq/alpha-beta-issues#697

SUPER type from Redshift docs

Related forum topic

@marcosmarxm
Copy link
Member

Zendesk ticket #1473 has been linked to this issue.

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Augustin on 2022-07-11 at 13:03:

I created an [issue](https://github.com//issues/14573) on our repo for this error. Please subscribe to receive updates. 

@validumitru
Copy link

This error is also happening while trying to sync engagements in the Hubspot connector.

@grishick grishick added the team/destinations Destinations team's backlog label Sep 27, 2022
@jan-benisek
Copy link

I encountered the same today (Airbyte 0.40.18, connector version 0.2.3). Any idea when will this be fixed 🙏

@jena-binay
Copy link

jena-binay commented Jan 27, 2023

I'm on Airbyte 0.4.27 getting the same error on Jira connector (0.3.3)

@validumitru
Copy link

This error just started breaking the Hubspot sync for us today :(

@cidraljunior
Copy link

I am getting the same error. Any fix?

@pranasziaukas
Copy link

pranasziaukas commented Apr 18, 2023

Running into this while syncing HubSpot Companies and Contacts into Redshift.

@josephbrownskilljar
Copy link

josephbrownskilljar commented Apr 18, 2023 via email

@alexandrafetterman
Copy link

I am getting this error while synching Jira Issues into Redshift.

@evantahler
Copy link
Contributor

Closing this issue as normalization is going away #26028

@pranasziaukas
Copy link

normalization is going away

Could you expand a bit by any chance @evantahler?

For example, we had issues with HubSpot (source) records that were flowing to Redshift (destination), and because those records were large JSON objects they'd exceed Redshift's SUPER limit (as far as I understand).

What does the end of normalization imply for the above?

@evantahler
Copy link
Contributor

The problem with large source records which can't fit in the destination still remains, regardless of normalization. We'll need to fix it more generally. We are discussing what to do about it here - #28541

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests