Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested columns are not written by normalization #1896

Closed
sherifnada opened this issue Jan 29, 2021 · 6 comments
Closed

Nested columns are not written by normalization #1896

sherifnada opened this issue Jan 29, 2021 · 6 comments

Comments

@sherifnada
Copy link
Contributor

sherifnada commented Jan 29, 2021

Expected Behavior

I expect the Hubspot source connector to replicate contact properties from my Hubspot CRM.

Current Behavior

This was the table that was written to postgres with normalization:

hubspot=# \d contacts
                                 Table "public.contacts"
          Column          |            Type             | Collation | Nullable | Default
--------------------------+-----------------------------+-----------+----------+---------
 _airbyte_emitted_at      | timestamp with time zone    |           |          |
 _airbyte_normalized_at   | timestamp without time zone |           |          |
 vid                      | double precision            |           |          |
 portal_id                | double precision            |           |          |
 is_contact               | boolean                     |           |          |
 profile_url              | character varying           |           |          |
 canonical_vid            | double precision            |           |          |
 profile_token            | character varying           |           |          |
 _airbyte_contacts_hashid | text                        |           |          |

which is a very small subset of the properties available from the contacts stream. The full schema of the stream is listed in this gist (it's too big to display inline here).

Steps to Reproduce

  1. Sync from hubspot to postgres
  2. 💣
@sherifnada sherifnada added type/bug Something isn't working area/connectors Connector related issues labels Jan 29, 2021
@sherifnada sherifnada changed the title Nested columns are missing when reading from Hubspot source connector Nested columns are not written by normalization Jan 29, 2021
@sherifnada
Copy link
Contributor Author

When I run the source connector, here is one output record (it contains the missing fields):

{"type": "RECORD", "record": {"stream": "contacts", "data": {"vid": 151, "canonical-vid": 151, "merged-vids": [], "portal-id": 8727216, "is-contact": true, "profile-token": "AO_T-mNt4z6rtIKQdEW9S5wdZD2gjvxwHY7OzHtUaX20rbKO176VDbOVBqUSIsAb0vOQX1fKSp0bIx7pjaa1xIkzvK1sz7L-N1Wqj9KODpzKz3zV4xYL954lh_Tp1dNilBZqC7luiP97", "profile-url": "https://app.hubspot.com/contacts/8727216/contact/151", "properties": {"hs_is_unworked": {"value": true}, "firstname": {"value": "shrif"}, "associatedcompanyid": {"value": 5000526215.0}, "num_unique_conversion_events": {"value": 0.0}, "hs_analytics_revenue": {"value": 0.0}, "createdate": {"value": "2020-12-11T01:29:50.116000Z"}, "hs_analytics_num_visits": {"value": 0.0}, "hubspot_owner_id": {"value": "52550153"}, "hs_analytics_source": {"value": "OFFLINE"}, "hs_email_domain": {"value": "dataline.io"}, "hs_analytics_num_page_views": {"value": 0.0}, "hs_all_owner_ids": {"value": "52550153"}, "hs_count_is_worked": {"value": 0.0}, "email": {"value": "sherif+1@dataline.io"}, "lastmodifieddate": {"value": "2021-01-04T07:07:36.275000Z"}, "hs_analytics_first_timestamp": {"value": "2020-12-11T01:29:50.116000Z"}, "hs_lifecyclestage_subscriber_date": {"value": "2020-12-11T01:29:50.116000Z"}, "hs_user_ids_of_all_owners": {"value": "12282590"}, "hs_analytics_average_page_views": {"value": 0.0}, "hs_all_contact_vids": {"value": "151"}, "lastname": {"value": "nada"}, "hs_is_contact": {"value": true}, "hubspot_owner_assigneddate": {"value": "2020-12-11T01:29:50.093000Z"}, "num_conversion_events": {"value": 0.0}, "hs_object_id": {"value": 151.0}, "hs_analytics_num_event_completions": {"value": 0.0}, "hs_analytics_source_data_2": {"value": "CRM_UI"}, "hs_analytics_source_data_1": {"value": "CONTACTS"}, "lifecyclestage": {"value": "subscriber"}, "hs_count_is_unworked": {"value": 1.0}}, "form-submissions": [], "list-memberships": [], "identity-profiles": [{"vid": 151, "saved-at-timestamp": "2020-12-11T01:29:50.153000Z", "deleted-changed-timestamp": "1970-01-01T00:00:00.000000Z", "identities": [{"type": "EMAIL", "value": "sherif+1@dataline.io", "timestamp": "2020-12-11T01:29:50.116000Z"}, {"type": "LEAD_GUID", "value": "53526567-31d1-4973-8f67-7ccbed16fd30", "timestamp": "2020-12-11T01:29:50.148000Z"}]}], "merge-audits": [], "associated-company": {"company-id": 5000526215, "portal-id": 8727216, "properties": {"country": {"value": "United States"}, "num_associated_contacts": {"value": 1.0}, "city": {"value": "San Francisco"}, "timezone": {"value": "America/Los_Angeles"}, "description": {"value": "Airbyte is the turnkey open-source data integration platform that gives your infrastructure super powers to move data seamlessly."}, "createdate": {"value": "2020-12-11T01:27:40.002000Z"}, "web_technologies": {"value": "wordpress;slack;segment;google_tag_manager;cloud_flare;google_apps;hubspot;drift"}, "hs_num_blockers": {"value": 0.0}, "numberofemployees": {"value": 10.0}, "hs_total_deal_value": {"value": 60000.0}, "hs_analytics_num_visits": {"value": 0.0}, "hubspot_owner_id": {"value": "52550153"}, "hs_analytics_source": {"value": "OFFLINE"}, "first_deal_created_date": {"value": "2021-01-13T10:30:42.221000Z"}, "hs_created_by_user_id": {"value": 12282590.0}, "num_associated_deals": {"value": 1.0}, "hs_analytics_num_page_views": {"value": 0.0}, "state": {"value": "CA"}, "hs_all_owner_ids": {"value": "52550153"}, "linkedinbio": {"value": "Airbyte is the turnkey open-source data integration platform that gives your infrastructure super powers to move data seamlessly."}, "hs_num_open_deals": {"value": 1.0}, "website": {"value": "dataline.io"}, "hs_analytics_first_timestamp": {"value": "2020-12-11T01:29:50.116000Z"}, "hs_user_ids_of_all_owners": {"value": "12282590"}, "twitterhandle": {"value": "AirbyteHQ"}, "first_contact_createdate": {"value": "2020-12-11T01:29:50.116000Z"}, "hs_target_account_probability": {"value": 0.46257445216178894}, "hs_lastmodifieddate": {"value": "2021-01-13T10:31:26.530000Z"}, "hubspot_owner_assigneddate": {"value": "2020-12-11T01:27:40.002000Z"}, "hs_num_decision_makers": {"value": 0.0}, "domain": {"value": "dataline.io"}, "name": {"value": "Dataline"}, "is_public": {"value": false}, "hs_object_id": {"value": 5000526215.0}, "hs_num_contacts_with_buying_roles": {"value": 0.0}, "hs_num_child_companies": {"value": 0.0}, "hs_analytics_source_data_2": {"value": "CRM_UI"}, "hs_analytics_source_data_1": {"value": "CONTACTS"}, "lifecyclestage": {"value": "opportunity"}, "hs_updated_by_user_id": {"value": 12282590.0}}}, "property_hs_is_unworked": {"value": true}, "property_firstname": {"value": "shrif"}, "property_associatedcompanyid": {"value": 5000526215.0}, "property_num_unique_conversion_events": {"value": 0.0}, "property_hs_analytics_revenue": {"value": 0.0}, "property_createdate": {"value": "2020-12-11T01:29:50.116000Z"}, "property_hs_analytics_num_visits": {"value": 0.0}, "property_hubspot_owner_id": {"value": "52550153"}, "property_hs_analytics_source": {"value": "OFFLINE"}, "property_hs_email_domain": {"value": "dataline.io"}, "property_hs_analytics_num_page_views": {"value": 0.0}, "property_hs_all_owner_ids": {"value": "52550153"}, "property_hs_count_is_worked": {"value": 0.0}, "property_email": {"value": "sherif+1@dataline.io"}, "property_lastmodifieddate": {"value": "2021-01-04T07:07:36.275000Z"}, "property_hs_analytics_first_timestamp": {"value": "2020-12-11T01:29:50.116000Z"}, "property_hs_lifecyclestage_subscriber_date": {"value": "2020-12-11T01:29:50.116000Z"}, "property_hs_user_ids_of_all_owners": {"value": "12282590"}, "property_hs_analytics_average_page_views": {"value": 0.0}, "property_hs_all_contact_vids": {"value": "151"}, "property_lastname": {"value": "nada"}, "property_hs_is_contact": {"value": true}, "property_hubspot_owner_assigneddate": {"value": "2020-12-11T01:29:50.093000Z"}, "property_num_conversion_events": {"value": 0.0}, "property_hs_object_id": {"value": 151.0}, "property_hs_analytics_num_event_completions": {"value": 0.0}, "property_hs_analytics_source_data_2": {"value": "CRM_UI"}, "property_hs_analytics_source_data_1": {"value": "CONTACTS"}, "property_lifecyclestage": {"value": "subscriber"}, "property_hs_count_is_unworked": {"value": 1.0}}, "emitted_at": 1611960281000}}

And when I select * from _airbyte_raw_contacts in Postgres I see the following record (not the full output for clarity):

 {"vid": 201, "portal-id": 8727216, "is-contact": true, "properties": {"zip": {"value": "02139"}, "city": {"value": "Cambridge"}, "email": {"value": "testingapis@hubspot.com"}, "phone": {"value": "555-122-2323"}, "s
tate": {"value": "MA"}, "address": {"value": "25 First Street"}, "company": {"value": "HubSpot"}, "website": {"value": "http://hubspot.com"}, "lastname": {"value": "testerson"}, "firstname": {"value": "test"}, "createdate": {"value": "2021-01-14T14:26:17
.014000Z"}, "hs_object_id": {"value": 201.0}, "hs_is_contact": {"value": true}, "hs_is_unworked": {"value": true}, "lifecyclestage": {"value": "subscriber"}, "hs_email_domain": {"value": "hubspot.com"}, "lastmodifieddate": {"value": "2021-01-14T14:27:00.
279000Z"}, "hs_all_contact_vids": {"value": "201"}, "hs_analytics_source": {"value": "OFFLINE"}, "hs_analytics_revenue": {"value": 0.0}, "num_conversion_events": {"value": 0.0}, "hs_analytics_num_visits": {"value": 0.0}, "hs_analytics_source_data_1": {"v
alue": "API"}, "hs_analytics_num_page_views": {"value": 0.0}, "hs_analytics_first_timestamp": {"value": "2021-01-14T14:26:17.014000Z"}, "num_unique_conversion_events": {"value": 0.0}, "hs_analytics_average_page_views": {"value": 0.0}, "hs_email_hard_boun
ce_reason_enum": {"value": "OTHER"}, "hs_lifecyclestage_subscriber_date": {"value": "2021-01-14T14:26:17.014000Z"}, "hs_analytics_num_event_completions": {"value": 0.0}, "hs_searchable_calculated_phone_number": {"value": "5551222323"}}, "merged-vids": []
, "profile-url": "https://app.hubspot.com/contacts/8727216/contact/201", "merge-audits": [], "property_zip": {"value": "02139"}, "canonical-vid": 201, "profile-token": "AO_T-mNG0HPseyqzFOLRUiTs4R_1hq5GbKnrXhT9aSPgSbF8liL1m6vOKVpXuT8yu_4D6_cw8dkKjNEDyogFg
QlePbmL9ufiEPDZA3jLE-3_h4om39Fazo9bjI1fVI7gGslUfymFfdCv", "property_city": {"value": "Cambridge"}, "property_email": {"value": "testingapis@hubspot.com"}, "property_phone": {"value": "555-122-2323"}, "property_state": {"value": "MA"}, "form-submissions":
 [], "list-memberships": [], "property_address": {"value": "25 First Street"}, "property_company": {"value": "HubSpot"}, "property_website": {"value": "http://hubspot.com"}, "identity-profiles": [{"vid": 201, "identities": [{"type": "EMAIL", "value": "te
stingapis@hubspot.com", "timestamp": "2021-01-14T14:26:17.014000Z"}, {"type": "LEAD_GUID", "value": "b96dc8e3-5b11-42a9-a166-5a4631fc7228", "timestamp": "2021-01-14T14:26:17.073000Z"}], "saved-at-timestamp": "2021-01-14T14:26:17.081000Z", "deleted-change
d-timestamp": "1970-01-01T00:00:00.000000Z"}], "property_lastname": {"value": "testerson"}, "property_firstname": {"value": "test"}, "property_createdate": {"value": "2021-01-14T14:26:17.014000Z"}, "property_hs_object_id": {"value": 201.0}, "property_hs_
is_contact": {"value": true}, "property_hs_is_unworked": {"value": true}, "property_lifecyclestage": {"value": "subscriber"}, "property_hs_email_domain": {"value": "hubspot.com"}, "property_lastmodifieddate": {"value": "2021-01-14T14:27:00.279000Z"}, "pr
operty_hs_all_contact_vids": {"value": "201"}, "property_hs_analytics_source": {"value": "OFFLINE"}, "property_hs_analytics_revenue": {"value": 0.0}, "property_num_conversion_events": {"value": 0.0}, "property_hs_analytics_num_visits": {"value": 0.0}, "p
roperty_hs_analytics_source_data_1": {"value": "API"}, "property_hs_analytics_num_page_views": {"value": 0.0}, "property_hs_analytics_first_timestamp": {"value": "2021-01-14T14:26:17.014000Z"}, "property_num_unique_conversion_events": {"value": 0.0}, "pr
operty_hs_analytics_average_page_views": {"value": 0.0}, "property_hs_email_hard_bounce_reason_enum": {"value": "OTHER"}, "property_hs_lifecyclestage_subscriber_date": {"value": "2021-01-14T14:26:17.014000Z"}, "property_hs_analytics_num_event_completions
": {"value": 0.0}, "property_hs_searchable_calculated_phone_number": {"value": "5551222323"}}

this point to the issue being a normalization problem.

@sherifnada
Copy link
Contributor Author

This is blocked on #1491

@ChristopheDuong
Copy link
Contributor

Nested columns are not supported by normalization as reported here: #886

@jrhizor jrhizor added the priority/high High priority label Feb 11, 2021
@cgardens cgardens removed the type/bug Something isn't working label Feb 15, 2021
@cgardens
Copy link
Contributor

removing the bug tag on this one as it is not currently supported.

@ChristopheDuong i think the stuff you're work is going to add this feature, right should we close it as duplicate or is there any additional info on here that you need?

@ChristopheDuong
Copy link
Contributor

ChristopheDuong commented Feb 15, 2021

removing the bug tag on this one as it is not currently supported.

@ChristopheDuong i think the stuff you're work is going to add this feature, right should we close it as duplicate or is there any additional info on here that you need?

We probably should keep them open and review the test cases one by one to validate that normalization is indeed working well with various sources.

For example, at this point the unnesting normalization seems to pass successfully on Facebook and Stripe sources but is failing on Hubspot:

Screenshot 2021-02-15 at 18 19 05

Re-working the name collisions and making sure the table name is not too long may help in solving this one:
#2055

Upgrading DBT to version 19 will also give us more room on tables name length for postgres:
#2013

Screenshot 2021-02-15 at 18 26 38
https://discourse.getdbt.com/t/release-v0-19-0-kiyoshi-kuromiya/1951

@ChristopheDuong
Copy link
Contributor

This is now better handled in normalization, closing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants