Skip to content

degreed-data-engineering/tap-salesforce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tap-salesforce

PyPI version CircleCI Build Status

Singer tap that extracts data from a Salesforce Account and produces JSON-formatted data following the Singer spec.

This is a forked version of tap-salesforce (v1.4.24) that maintained by the Meltano team.

Main differences from the original version:

  • Support for username/password/security_token authentication
  • Support for concurrent execution (8 threads by default) when accessing different API endpoints to speed up the extraction process

Quickstart

Install the tap

This version of tap-salesforce is not available on PyPi, so you have to fetch it directly from the Meltano maintained project:

python3 -m venv venv
source venv/bin/activate
pip install git+https://gitlab.com/meltano/tap-salesforce.git

Create a Config file

Required

{
  "api_type": "BULK",
  "select_fields_by_default": true,
}

Required for OAuth based authentication

{
  "client_id": "secret_client_id",
  "client_secret": "secret_client_secret",
  "refresh_token": "abc123",
}

Required for username/password based authentication

{
  "username": "Account Email",
  "password": "Account Password",
  "security_token": "Security Token",
}

Optional

{
  "start_date": "2017-11-02T00:00:00Z",
  "state_message_threshold": 1000,
  "max_workers": 8
}

The client_id and client_secret keys are your OAuth Salesforce App secrets. The refresh_token is a secret created during the OAuth flow. For more info on the Salesforce OAuth flow, visit the Salesforce documentation.

The start_date is used by the tap as a bound on SOQL queries when searching for records. This should be an RFC3339 formatted date-time, like "2018-01-08T00:00:00Z". For more details, see the Singer best practices for dates.

The api_type is used to switch the behavior of the tap between using Salesforce's "REST" and "BULK" APIs. When new fields are discovered in Salesforce objects, the select_fields_by_default key describes whether or not the tap will select those fields by default.

The state_message_threshold is used to throttle how often STATE messages are generated when the tap is using the "REST" API. This is a balance between not slowing down execution due to too many STATE messages produced and how many records must be fetched again if a tap fails unexpectedly. Defaults to 1000 (generate a STATE message every 1000 records).

The max_workers value is used to set the maximum number of threads used in order to concurrently extract data for streams. Defaults to 8 (extract data for 8 streams in paralel).

Run Discovery

To run discovery mode, execute the tap with the config file.

tap-salesforce --config config.json --discover > properties.json

Sync Data

To sync data, select fields in the properties.json output and run the tap.

tap-salesforce --config config.json --properties properties.json [--state state.json]

Copyright © 2017 Stitch

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages