-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source Hubspot: check if it has a state on search streams #15110
Source Hubspot: check if it has a state on search streams #15110
Conversation
@grubberr can you check this contirbution I saw you created the other function |
/test connector=connectors/source-hubspot
Build FailedTest summary info:
|
hey @vladimir-remar could you please merge the up-to-date master into your branch? otherwise we are not able to run acceptance tests |
def set_sync(self, sync_mode: SyncMode, stream_state): | ||
self._sync_mode = sync_mode | ||
if self._sync_mode == SyncMode.incremental: | ||
if stream_state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladimir-remar this is the main change here, right?
could you please provide an example of the use case for this? I don't quite get what's the point of this change and under what conditions it would come into action
btw, isn't stream_state
value same as self._state
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @davydov-d, thanks for answer.
Indeed it has the same value if the sync mode is incremental and run with a valid state.
Lets get into the main idea I have:
- First sync: Incremental / no state, should use f"/crm/v3/objects/{self.entity}"
- Following syncs: Incremental / state, should use f"/crm/v3/objects/{self.entity}/search"
Probably I missed something or I got the wrong idea refer to this.
Anyways thanks for your help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladimir-remar thanks for your time.
the reason I'm asking is that unfortunately, this connector's code is quite complicated and this change does not bring in more clarity..
so, if stream_state
value is same as self._state
, can't it be simplified to:
if self._sync_mode == SyncMode.incremental:
if self._state:
if not self._state:
self._state = self._start_date
else:
self._state = self._start_date = max(self._state, self._start_date)
?
if so, not
branch will never execute. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davydov-d Thanks to you.
I am not referring to the current value of the state, but to whether it had a previous initial state, before the first sync,
that is why my suggestion is oriented to whether it is the first sync or not. That's why I identify the initial value of stream_state
and then I set the initial value of self._state
It is more related in which endpoint will the stream use
It depends if state property was set before, the actual approach in incremental mode the state will be set always incurring use the /search endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladimir-remar I'm afraid you were misled by this if self._state
statement - it's kind of a workaround with potential pitfalls. The idea is to use /search in Incremental mode (or whenever you have a filtering criteria as @grubberr mentioned above) as main url regardless of the state. This code should be refactored I believe 😕
as I remember we select one or another endpoint depending whether we have filtering parameter or not
filtering parameter Can you please explain your vision? maybe I am missing something in hubspot API details. Thank you |
Hi @grubberr thanks you. airbyte/airbyte-integrations/connectors/source-hubspot/source_hubspot/streams.py Line 892 in a54c0ef
It depends if state property was set before, the actual approach in incremental mode the state will be set always incurring use the /search endpoint.If the main intention was use /search in Incremental as main url it makes sense now for me.
|
/test connector=connectors/source-hubspot
Build PassedTest summary info:
|
@davydov-d do you think we can merge this contribution? I can finish it if needed. |
@marcosmarxm I have strong doubts. Replied to Vladimir above |
@davydov-d sorry I missed that, |
@davydov-d Thanks. We think they would also solve this https://discuss.airbyte.io/t/source-hubspot-contact-list-membership-contacts-extraction-performance-optimization/2219 I say that it is related because the use of the search endpoint causes this behavior. |
@vladimir-remar are you sure about an infinite loop? the records count and the sync time is probably related to this change. p.s. could you please try running the sync locally with your change? then we could know for sure if it changes the way sync works |
@davydov-d Thanks, I've made a custom connector with my changes, I attached the logs. |
@vladimir-remar looks good, makes sense now to accept this change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcosmarxm we need to bump version here and update the changelog please
/test connector=connectors/source-hubspot
Build PassedTest summary info:
|
/publish connector=connectors/source-hubspot
if you have connectors that successfully published but failed definition generation, follow step 4 here |
* check if it has a state * update version in Dockerfile and docs * auto-bump connector version [ci skip] Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
What
The idea of CRMSearchStream is to use one endpoint or another depending on whether it has a previous state
airbyte/airbyte-integrations/connectors/source-hubspot/source_hubspot/streams.py
Line 761 in 65295b7
In other words, if it does not have a previous state, use the List endpoint for the CRM object (Contacts, Deals...) and if it has a state, use the search endpoint.
How
Check if has a previous state before setting the state at the beginning of the sync.
Recommended reading order
x.java
y.python
🚨 User Impact 🚨
Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
Updating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.