Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Hubspot: check if it has a state on search streams #15110

Conversation

vladimir-remar
Copy link
Contributor

What

The idea of CRMSearchStream is to use one endpoint or another depending on whether it has a previous state

return f"/crm/v3/objects/{self.entity}/search" if self.state else f"/crm/v3/objects/{self.entity}"
.
In other words, if it does not have a previous state, use the List endpoint for the CRM object (Contacts, Deals...) and if it has a state, use the search endpoint.

How

Check if has a previous state before setting the state at the beginning of the sync.

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

@marcosmarxm
Copy link
Member

@grubberr can you check this contirbution I saw you created the other function set_sync for this connector.

@sajarin sajarin added the bounty-S Maintainer program: claimable small bounty PR label Aug 5, 2022
@davydov-d
Copy link
Collaborator

davydov-d commented Aug 9, 2022

/test connector=connectors/source-hubspot

🕑 connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/2823229366
❌ connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/2823229366
🐛 https://gradle.com/s/khedxyvcdi77k

Build Failed

Test summary info:

Could not find result summary

@davydov-d
Copy link
Collaborator

hey @vladimir-remar could you please merge the up-to-date master into your branch? otherwise we are not able to run acceptance tests
image

def set_sync(self, sync_mode: SyncMode, stream_state):
self._sync_mode = sync_mode
if self._sync_mode == SyncMode.incremental:
if stream_state:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladimir-remar this is the main change here, right?

could you please provide an example of the use case for this? I don't quite get what's the point of this change and under what conditions it would come into action

btw, isn't stream_state value same as self._state?

Copy link
Contributor Author

@vladimir-remar vladimir-remar Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @davydov-d, thanks for answer.

Indeed it has the same value if the sync mode is incremental and run with a valid state.
Lets get into the main idea I have:

  • First sync: Incremental / no state, should use f"/crm/v3/objects/{self.entity}"
  • Following syncs: Incremental / state, should use f"/crm/v3/objects/{self.entity}/search"

Probably I missed something or I got the wrong idea refer to this.
Anyways thanks for your help.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladimir-remar thanks for your time.
the reason I'm asking is that unfortunately, this connector's code is quite complicated and this change does not bring in more clarity..

so, if stream_state value is same as self._state, can't it be simplified to:

if self._sync_mode == SyncMode.incremental:
    if self._state:
        if not self._state:
            self._state = self._start_date
        else:
            self._state = self._start_date = max(self._state, self._start_date)

?
if so, not branch will never execute. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davydov-d Thanks to you.
I am not referring to the current value of the state, but to whether it had a previous initial state, before the first sync,
that is why my suggestion is oriented to whether it is the first sync or not. That's why I identify the initial value of stream_state and then I set the initial value of self._state

It is more related in which endpoint will the stream use

It depends if state property was set before, the actual approach in incremental mode the state will be set always incurring use the /search endpoint.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladimir-remar I'm afraid you were misled by this if self._state statement - it's kind of a workaround with potential pitfalls. The idea is to use /search in Incremental mode (or whenever you have a filtering criteria as @grubberr mentioned above) as main url regardless of the state. This code should be refactored I believe 😕

@grubberr
Copy link
Contributor

grubberr commented Aug 9, 2022

@vladimir-remar

@property
    def url(self):
        return f"/crm/v3/objects/{self.entity}/search" if self.state else f"/crm/v3/objects/{self.entity}"

as I remember we select one or another endpoint depending whether we have filtering parameter or not

  1. if we have filtering param start_date we use /search endpoint
  2. if we don't have filtering param start_date we use /objects endpoint

filtering parameter start_date = max(config, state)

Can you please explain your vision? maybe I am missing something in hubspot API details.

Thank you

@vladimir-remar
Copy link
Contributor Author

as I remember we select one or another endpoint depending whether we have filtering parameter or not

  1. if we have filtering param start_date we use /search endpoint
  2. if we don't have filtering param start_date we use /objects endpoint

filtering parameter start_date = max(config, state)

Hi @grubberr thanks you.
As I see here


It depends if state property was set before, the actual approach in incremental mode the state will be set always incurring use the /search endpoint.
If the main intention was use /search in Incremental as main url it makes sense now for me.

@davydov-d
Copy link
Collaborator

davydov-d commented Aug 11, 2022

/test connector=connectors/source-hubspot

🕑 connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/2838623777
✅ connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/2838623777
Python tests coverage:

Name                                                 Stmts   Miss  Cover
------------------------------------------------------------------------
source_acceptance_test/utils/__init__.py                 6      0   100%
source_acceptance_test/tests/__init__.py                 4      0   100%
source_acceptance_test/__init__.py                       2      0   100%
source_acceptance_test/tests/test_full_refresh.py       52      2    96%
source_acceptance_test/utils/asserts.py                 37      2    95%
source_acceptance_test/config.py                        82      6    93%
source_acceptance_test/utils/json_schema_helper.py     105     13    88%
source_acceptance_test/tests/test_incremental.py       121     25    79%
source_acceptance_test/utils/common.py                  77     17    78%
source_acceptance_test/tests/test_core.py              355    107    70%
source_acceptance_test/utils/compare.py                 62     23    63%
source_acceptance_test/base.py                          10      4    60%
source_acceptance_test/utils/connector_runner.py       110     48    56%
------------------------------------------------------------------------
TOTAL                                                 1023    247    76%
Name                          Stmts   Miss  Cover
-------------------------------------------------
source_hubspot/errors.py          6      0   100%
source_hubspot/constants.py       3      0   100%
source_hubspot/__init__.py        2      0   100%
source_hubspot/helpers.py        70      7    90%
source_hubspot/source.py         90     19    79%
source_hubspot/streams.py       818    199    76%
-------------------------------------------------
TOTAL                           989    225    77%
Name                          Stmts   Miss  Cover
-------------------------------------------------
source_hubspot/errors.py          6      0   100%
source_hubspot/constants.py       3      0   100%
source_hubspot/__init__.py        2      0   100%
source_hubspot/helpers.py        70      3    96%
source_hubspot/streams.py       818     76    91%
source_hubspot/source.py         90     18    80%
-------------------------------------------------
TOTAL                           989     97    90%

Build Passed

Test summary info:

All Passed

@sajarin sajarin added internal and removed bounty bounty-S Maintainer program: claimable small bounty PR labels Aug 11, 2022
@marcosmarxm
Copy link
Member

@davydov-d do you think we can merge this contribution? I can finish it if needed.

@davydov-d
Copy link
Collaborator

@davydov-d do you think we can merge this contribution? I can finish it if needed.

@marcosmarxm I have strong doubts. Replied to Vladimir above

@marcosmarxm
Copy link
Member

@davydov-d sorry I missed that,

@vladimir-remar
Copy link
Contributor Author

@davydov-d Thanks.
We are having problems with the synchronization of contacts and this PR would solve them. We have about 314k of records, however the Incremental synchronization is having wrong behaviors resulting in infinite loops or wrong readings of the total results attached log and an image. The current implementation may work fine for a few records but with a lot of records, the Synch takes hours and the cursor gets messed.

We think they would also solve this https://discuss.airbyte.io/t/source-hubspot-contact-list-membership-contacts-extraction-performance-optimization/2219

I say that it is related because the use of the search endpoint causes this behavior.
Screenshot 2022-08-09 at 14 06 07
hubspot_contacts.txt

@davydov-d
Copy link
Collaborator

davydov-d commented Aug 17, 2022

@davydov-d Thanks. We are having problems with the synchronization of contacts and this PR would solve them. We have about 314k of records, however the Incremental synchronization is having wrong behaviors resulting in infinite loops or wrong readings of the total results attached log and an image. The current implementation may work fine for a few records but with a lot of records, the Synch takes hours and the cursor gets messed.

We think they would also solve this https://discuss.airbyte.io/t/source-hubspot-contact-list-membership-contacts-extraction-performance-optimization/2219

I say that it is related because the use of the search endpoint causes this behavior. Screenshot 2022-08-09 at 14 06 07 hubspot_contacts.txt

@vladimir-remar are you sure about an infinite loop? the records count and the sync time is probably related to this change.
it reads associations from a specialized endpoint in incremental mode since /search endpoint does not support associations and counts each association as a record. if you dont need those during the sync perhaps we could introduce a new option in the input config or move this sync to another stream.. your thoughts on this are welcome

p.s. could you please try running the sync locally with your change? then we could know for sure if it changes the way sync works

@vladimir-remar
Copy link
Contributor Author

@davydov-d Thanks. We are having problems with the synchronization of contacts and this PR would solve them. We have about 314k of records, however the Incremental synchronization is having wrong behaviors resulting in infinite loops or wrong readings of the total results attached log and an image. The current implementation may work fine for a few records but with a lot of records, the Synch takes hours and the cursor gets messed.
We think they would also solve this https://discuss.airbyte.io/t/source-hubspot-contact-list-membership-contacts-extraction-performance-optimization/2219
I say that it is related because the use of the search endpoint causes this behavior. Screenshot 2022-08-09 at 14 06 07 hubspot_contacts.txt

@vladimir-remar are you sure about an infinite loop? the records count and the sync time is probably related to this change. it reads associations from a specialized endpoint in incremental mode since /search endpoint does not support associations and counts each association as a record. if you dont need those during the sync perhaps we could introduce a new option in the input config or move this sync to another stream.. your thoughts on this are welcome

p.s. could you please try running the sync locally with your change? then we could know for sure if it changes the way sync works

@davydov-d Thanks, I've made a custom connector with my changes, I attached the logs.
hubspot_contacts-first-sync.txt
hubspot_contacts-next-sync.txt

@davydov-d
Copy link
Collaborator

@davydov-d Thanks. We are having problems with the synchronization of contacts and this PR would solve them. We have about 314k of records, however the Incremental synchronization is having wrong behaviors resulting in infinite loops or wrong readings of the total results attached log and an image. The current implementation may work fine for a few records but with a lot of records, the Synch takes hours and the cursor gets messed.
We think they would also solve this https://discuss.airbyte.io/t/source-hubspot-contact-list-membership-contacts-extraction-performance-optimization/2219
I say that it is related because the use of the search endpoint causes this behavior. Screenshot 2022-08-09 at 14 06 07 hubspot_contacts.txt

@vladimir-remar are you sure about an infinite loop? the records count and the sync time is probably related to this change. it reads associations from a specialized endpoint in incremental mode since /search endpoint does not support associations and counts each association as a record. if you dont need those during the sync perhaps we could introduce a new option in the input config or move this sync to another stream.. your thoughts on this are welcome
p.s. could you please try running the sync locally with your change? then we could know for sure if it changes the way sync works

@davydov-d Thanks, I've made a custom connector with my changes, I attached the logs. hubspot_contacts-first-sync.txt hubspot_contacts-next-sync.txt

@vladimir-remar looks good, makes sense now to accept this change

Copy link
Collaborator

@davydov-d davydov-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcosmarxm we need to bump version here and update the changelog please

@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Aug 18, 2022
@lazebnyi lazebnyi removed their request for review August 19, 2022 02:57
@marcosmarxm
Copy link
Member

marcosmarxm commented Aug 22, 2022

/test connector=connectors/source-hubspot

🕑 connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/2901307539
✅ connectors/source-hubspot https://github.com/airbytehq/airbyte/actions/runs/2901307539
Python tests coverage:

Name                          Stmts   Miss  Cover
-------------------------------------------------
source_hubspot/errors.py          6      0   100%
source_hubspot/constants.py       3      0   100%
source_hubspot/__init__.py        2      0   100%
source_hubspot/helpers.py        70      7    90%
source_hubspot/source.py         90     19    79%
source_hubspot/streams.py       818    198    76%
-------------------------------------------------
TOTAL                           989    224    77%
Name                          Stmts   Miss  Cover
-------------------------------------------------
source_hubspot/errors.py          6      0   100%
source_hubspot/constants.py       3      0   100%
source_hubspot/__init__.py        2      0   100%
source_hubspot/helpers.py        70      3    96%
source_hubspot/streams.py       818     76    91%
source_hubspot/source.py         90     18    80%
-------------------------------------------------
TOTAL                           989     97    90%
	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          10      4    60%   15-18
	 source_acceptance_test/config.py                        83      6    93%   78-80, 84-86
	 source_acceptance_test/conftest.py                     164    164     0%   6-282
	 source_acceptance_test/plugin.py                        48     48     0%   6-104
	 source_acceptance_test/tests/test_core.py              329    111    66%   39, 50-58, 63-70, 74-75, 79-80, 164, 202-219, 228-236, 240-245, 251, 284-289, 327-334, 374-376, 379, 439-448, 477-478, 484, 487, 520-530, 543-568, 573-577
	 source_acceptance_test/tests/test_full_refresh.py       52      2    96%   34, 65
	 source_acceptance_test/tests/test_incremental.py       121     25    79%   21-23, 29-31, 36-43, 48-61, 208-216
	 source_acceptance_test/utils/asserts.py                 37      2    95%   57-58
	 source_acceptance_test/utils/common.py                  77     17    78%   15-16, 24-30, 47-54, 64, 67
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/connector_runner.py       110     48    56%   23-26, 32, 36, 39-64, 67-69, 72-74, 77-79, 82-84, 87-89, 92-110, 144-146
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%   30-31, 38, 41, 65-68, 96, 120, 190-192
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1321    463    65%

Build Passed

Test summary info:

All Passed

@marcosmarxm
Copy link
Member

marcosmarxm commented Aug 22, 2022

/publish connector=connectors/source-hubspot

🕑 Publishing the following connectors:
connectors/source-hubspot
https://github.com/airbytehq/airbyte/actions/runs/2901378979


Connector Did it publish? Were definitions generated?
connectors/source-hubspot

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@marcosmarxm marcosmarxm merged commit eb2c7ad into airbytehq:master Aug 22, 2022
rodireich pushed a commit that referenced this pull request Aug 25, 2022
* check if it has a state

* update version in Dockerfile and docs

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation community connectors/source/hubspot internal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants