Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source Salesforce: NewValue and OldValue columns are not getting pulled for OpportunityFieldHistory #6886

Closed
sarunaskas opened this issue Oct 8, 2021 · 20 comments · Fixed by #7592 or #7778

Comments

@sarunaskas
Copy link

Enviroment

  • Airbyte version: 0.30.2-alpha
  • OS Version / Instance: GCP n1-standard-8
  • Deployment: Docker compose
  • Source Connector and version: Salesforce 0.1.2
  • Destination Connector and version: BigQuery 0.4.1
  • Severity: High
  • Step where error happened: Sync job

Current Behavior

"OldValue" and "NewValue" fields from OpportunityFieldHistory are not being pulled into BigQuery.
Checking BigQuery, none of the rows in _airbyte_raw_OpportunityFieldHistory have OldValue or NewValue in the _airbyte_data column.

In the connection settings page the Data types for these two columns are being considered as array which might be incorrect Salesforce ref.

Expected Behavior

Would expect to see those two columns fetched and stored as strings containing the values. These two columns are critical to our adoption of Airbyte.

Logs

Potentially relevant parts of the LOG
2021-10-08 08:02:45 INFO () DefaultAirbyteStreamFactory(internalLog):90 - Syncing stream: OpportunityFieldHistory 
2021-10-08 08:02:45 INFO () DefaultAirbyteStreamFactory(internalLog):90 - Created Job: 7504x000007iBROAA2 to sync OpportunityFieldHistory
2021-10-08 08:02:45 INFO () DefaultAirbyteStreamFactory(internalLog):90 - Sleeping 2 seconds while waiting for Job: 7504x000007iBROAA2 to complete
2021-10-08 08:02:45 INFO () DefaultReplicationWorker(lambda$getReplicationRunnable$2):203 - Records read: 10000
2021-10-08 08:02:45 INFO () DefaultReplicationWorker(lambda$getReplicationRunnable$2):203 - Records read: 11000
2021-10-08 08:02:45 INFO () DefaultReplicationWorker(lambda$getReplicationRunnable$2):203 - Records read: 12000
2021-10-08 08:02:48 INFO () DefaultAirbyteStreamFactory(internalLog):90 - Read 3020 records from OpportunityFieldHistory stream

...

2021-10-08 08:03:27 INFO () DefaultAirbyteStreamFactory(lambda$create$0):53 - 2021-10-08 08:03:27 INFO i.a.i.d.b.BigQueryRecordConsumer(copyTable):339 - {} - successfully copied tmp table: GenericData{classInfo=[datasetId, projectId, tableId], {datasetId=salesforce_airbyte, tableId=_airbyte_tmp_rot_OpportunityFieldHistory}} to final table: GenericData{classInfo=[datasetId, projectId, tableId], {datasetId=salesforce_airbyte, tableId=_airbyte_raw_OpportunityFieldHistory}}
2021-10-08 08:03:27 INFO () JsonSchemaValidator(test):56 - JSON schema validation failed. 
errors: $: null found, object expected
2021-10-08 08:03:27 ERROR () DefaultAirbyteStreamFactory(lambda$create$1):63 - Validation failed: null

...

2021-10-08 08:03:30 INFO () LineGobbler(voidCall):65 -   Ignoring stream 'NewValue' from OpportunityFieldHistory/NewValue because properties list is empty
2021-10-08 08:03:30 INFO () LineGobbler(voidCall):65 -   Ignoring stream 'OldValue' from OpportunityFieldHistory/OldValue because properties list is empty

Steps to Reproduce

  1. Add a Salesforce Source
  2. Set BigQuery Destination
  3. Select OpportunityFieldHistory with Full refresh | Overwrite
  4. Run the sync

Are you willing to submit a PR?

Not technically capable

@sarunaskas sarunaskas added the type/bug Something isn't working label Oct 8, 2021
@sarunaskas sarunaskas changed the title NewValue and OldValue columns are not getting pulled for OpportunityFieldHistory Source Salesforce: NewValue and OldValue columns are not getting pulled for OpportunityFieldHistory Oct 8, 2021
@marcosmarxm
Copy link
Member

I'm using the integration account and it's happening the problem too. The raw tables don't have any data from those two fields (are null)
image

@marcosmarxm
Copy link
Member

If you use a Python lib like simple_salesforce these columns return value.

Sorry to use a very limited image.
image

@VasylLazebnyk VasylLazebnyk added this to the Connectors 2021-11-12 milestone Oct 31, 2021
@yevhenii-ldv yevhenii-ldv self-assigned this Nov 1, 2021
@yevhenii-ldv
Copy link
Contributor

Hi @marcosmarxm
Could you help me with investigate this ticket, please, because I need a little more information about your investigate?
I am trying to reproduce this problem, but of all streams where there are potentially OldValue and NewValue fields, records exist only for the AssetHistory stream for our test account.
And I've tried various options:

  1. Using the Airbyte Salesforce connector (we are using the 52.0 version of the Salesforce API).
  2. Used library simple_salesforce for python (API version 42.0)
  3. Tried using regular requests
    image

In all cases, I got Null for the OldValue and NewValue fields. And I cannot be sure that these fields should have a different meaning, just like vice versa.
Please tell me with which stream you received the data for the OldValue and NewValue fields shown in the screenshot above, and what credentials you used.

@marcosmarxm
Copy link
Member

marcosmarxm commented Nov 2, 2021

Hello @yevhenii-ldv I enabled the LeadHistory option and edited a lead in the integration account.
I'm accessing https://workbench.developerforce.com/query.php with integration test credentials.


Query 1

Now when I apply the query to get the newValue and oldValue from leadHistory object I got:
SELECT CreatedById,CreatedDate,Field,Id,IsDeleted,LeadId,NewValue,OldValue FROM LeadHistory
image

No NewValue or OldValue column or values
If you change the method to Bulk CSV the csv file has the correct values.


Query 2

If I only use the newValue and oldValue column:
SELECT NewValue,OldValue FROM LeadHistory
image


Query 3

If I use the newValue and oldValue column at the beggining from the first query:
SELECT NewValue,OldValue,CreatedById,CreatedDate,Field,Id,IsDeleted,LeadId FROM LeadHistory
image

@marcosmarxm
Copy link
Member

This code use only request and should work to query Salesforce LeadHistory.

import requests

from urllib.parse import urlencode

instance_url = ''
consumer_key = ''
consumer_secret = ''
username = ''
password = ''

token_url = f'{instance_url}/services/oauth2/token'
data = {
    'grant_type': 'password',
    'client_id': f'{consumer_key}',
    'client_secret': f'{consumer_secret}',
    'username': f'{username}',
    'password': f'{password}'
}
r1 = requests.post(token_url, data=data)
d1 = r1.json()
token = d1['access_token']
headers = {'Authorization': f'Bearer {token}'}

api_versions_url = f'{instance_url}/services/data/'
r2 = requests.get(api_versions_url, headers=headers)
d2 = r2.json()
api_ver = d2[-1]['url']
api_url = f'{instance_url}{api_ver}'

query_url = f'{api_url}/query/'
query = """SELECT CreatedById,CreatedDate,DataType,Field,Id,IsDeleted,LeadId,NewValue,OldValue FROM LeadHistory ORDER BY NewValue ASC NULLS LAST"""
query_url_final = f'{query_url}?{urlencode({"q":query})}'
r3 = requests.get(query_url_final, headers=headers)
d3 = r3.json()

@marcosmarxm
Copy link
Member

@yevhenii-ldv I thinks can be closed right?

@yevhenii-ldv
Copy link
Contributor

@marcosmarxm I think not yet:
#7592 (comment)

@sarunaskas
Copy link
Author

@yevhenii-ldv Just tried out the new version, but I still see empty arrays in the destination for OldValue and NewValue fields. Am I missing something?

2021-11-08 07:28:47 INFO () LineGobbler(voidCall):65 -   Ignoring stream 'NewValue' from OpportunityFieldHistory/NewValue because properties list is empty
2021-11-08 07:28:47 INFO () LineGobbler(voidCall):65 -   Ignoring stream 'OldValue' from OpportunityFieldHistory/OldValue because properties list is empty

@yevhenii-ldv
Copy link
Contributor

@sarunaskas Could you please clarify which API Type you are using in your config: REST or BULK?

@sarunaskas
Copy link
Author

@yevhenii-ldv Thanks for the reply. I am using BULK.

@yevhenii-ldv
Copy link
Contributor

@sarunaskas I made a small update for the Salesforce connector, could you tried out the new version (0.1.4) and tell me the result of your sync?

@sarunaskas
Copy link
Author

@yevhenii-ldv Now I do see the OldValue and NewValue in raw _airbyte_raw_OpportunityFieldHistory table (I did not check whether those were populated before in 0.1.3), however, the default normalization still does not populate the values in BigQuery.

Still seeing the same warning:

2021-11-09 14:16:28 INFO () LineGobbler(voidCall):65 -   Ignoring stream 'NewValue' from OpportunityFieldHistory/NewValue because properties list is empty
2021-11-09 14:16:28 INFO () LineGobbler(voidCall):65 -   Ignoring stream 'OldValue' from OpportunityFieldHistory/OldValue because properties list is empty

@yevhenii-ldv
Copy link
Contributor

@sarunaskas Are you trying to fill in the same already created table in BigQuery or are you creating a new one?
I changed the data type, and now for the OldValue and NewValue fields, fields with the "string" data type should be created in BigQuery.

@sarunaskas
Copy link
Author

@yevhenii-ldv I tried resetting the dataset, but that did not help. What may be interesting is that the BigQuery schema says those two are string repeated columns: is it expecting an array?

@yevhenii-ldv
Copy link
Contributor

yevhenii-ldv commented Nov 10, 2021

@sarunaskas These two fields should be expected as strings in the destinaton, arrays were expected before, but I corrected the configuration of these fields and now they should be expected as strings.
Please check what type of data fields are currently created in BigQuery dataset. If the type is array - then please, could you create a new connection for the Salesforce source (updated to the last version) and create a new connection for BitgQuery (to create a new dataset and initialize the tables of the new one) and check how it works?

@sarunaskas
Copy link
Author

@yevhenii-ldv deleted the connection and the dataset in BigQuery. While trying to establish the connection from scratch I am getting a completely white screen while "We are fetching the schema of your data source." step is happening. This is what I see in Console, but unsure if this is related. I am currently on 0.30.22-alpha.

image

@yevhenii-ldv
Copy link
Contributor

@sarunaskas
It is very likely that your system did not start.
You can go to Settings and see what version you currently have installed and what the latest exists. If you do not have the latest version for Salesforce, please update it by clicking on the button Change. For Big Query, this is not necessary, but it can be done as well.
And recreate the connection one more time please.
image
image

@yevhenii-ldv
Copy link
Contributor

Hi @sarunaskas I have good news for you :)
I made a small revision, and now if you update the Salesforce version to 0.1.5, then you should no longer have problems with OldValue and NewValue fields for any stream.

Note: in case of updating the version and starting data reading, please create a new Dataset for BigQuery destination.

@sarunaskas
Copy link
Author

Hey @yevhenii-ldv. Thanks for the good news, however, I am still unable to establish the connection anymore. We bumped up the airbyte version to 0.30.37-alpha, however, I am still getting the blank screen and the errors I listed earlier in the console upon schema fetching.

Other sources work, Salesforce seems to be a special case. I am now on 0.1.5 SF source version.

@sarunaskas
Copy link
Author

@yevhenii-ldv deleted the connection and the dataset in BigQuery. While trying to establish the connection from scratch I am getting a completely white screen while "We are fetching the schema of your data source." step is happening. This is what I see in Console, but unsure if this is related. I am currently on 0.30.22-alpha.

image

@yevhenii-ldv I am still stuck in the same situation. Maybe you have any ideas how to resolve this? Still can't get past the the schema fetch. Quite a big blocker for us. Maybe I should raise this as a separate issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment