Singer tap that extracts data from a Salesforce Account and produces JSON-formatted data following the Singer spec.
This is a forked version of tap-salesforce (v1.4.24) that maintained by the Meltano team.
Main differences from the original version:
- Support for
username/password/security_token
authentication - Support for concurrent execution (8 threads by default) when accessing different API endpoints to speed up the extraction process
This version of tap-salesforce
is not available on PyPi, so you have to fetch it directly from the Meltano maintained project:
python3 -m venv venv
source venv/bin/activate
pip install git+https://gitlab.com/meltano/tap-salesforce.git
Required
{
"api_type": "BULK",
"select_fields_by_default": true,
}
Required for OAuth based authentication
{
"client_id": "secret_client_id",
"client_secret": "secret_client_secret",
"refresh_token": "abc123",
}
Required for username/password based authentication
{
"username": "Account Email",
"password": "Account Password",
"security_token": "Security Token",
}
Optional
{
"start_date": "2017-11-02T00:00:00Z",
"state_message_threshold": 1000,
"max_workers": 8
}
The client_id
and client_secret
keys are your OAuth Salesforce App secrets. The refresh_token
is a secret created during the OAuth flow. For more info on the Salesforce OAuth flow, visit the Salesforce documentation.
The start_date
is used by the tap as a bound on SOQL queries when searching for records. This should be an RFC3339 formatted date-time, like "2018-01-08T00:00:00Z". For more details, see the Singer best practices for dates.
The api_type
is used to switch the behavior of the tap between using Salesforce's "REST" and "BULK" APIs. When new fields are discovered in Salesforce objects, the select_fields_by_default
key describes whether or not the tap will select those fields by default.
The state_message_threshold
is used to throttle how often STATE messages are generated when the tap is using the "REST" API. This is a balance between not slowing down execution due to too many STATE messages produced and how many records must be fetched again if a tap fails unexpectedly. Defaults to 1000 (generate a STATE message every 1000 records).
The max_workers
value is used to set the maximum number of threads used in order to concurrently extract data for streams. Defaults to 8 (extract data for 8 streams in paralel).
To run discovery mode, execute the tap with the config file.
tap-salesforce --config config.json --discover > properties.json
To sync data, select fields in the properties.json
output and run the tap.
tap-salesforce --config config.json --properties properties.json [--state state.json]
Copyright © 2017 Stitch