Skip to content

Commit

Permalink
source-pendo: backfill a day at a time
Browse files Browse the repository at this point in the history
For Pendo accounts with a significant amount of data, the strategy of
asking for all data up until the present does not work. Pendo's response
time is long enough to cause TimeoutErrors, and responses that are
received end up OOM-ing the connector.

To fix this, there is now a distinct backfill process for events and
aggregated events. Notable changes include:
- Separating backfills from ongoing incremental replication.
  - Backfills try to get at max a day's worth of data. Incremental
    replication continues to get all data up to the present.
- The cutoff between backfills and incremental replication is shifted
  backwards 12 hours due to delays between when an event occurred and
  when it is available in the API.
- The Pendo API response limit is increased to 50k. This might be able
  to be increased, but I've done limited testing to confirm 50k won't
  OOM the connector.
- API responses are now sorted first by timestamp and next by resource
  ID. This lets us more effectively "paginate" through documents if
  multiple occur at the same timestamp.
- If all the documents in a given API response have the same timestamp,
  the connector will fetch the remaining documents with that exact
  timestamp before incrementing the connector by a millisecond.
  • Loading branch information
Alex-Bair committed Oct 7, 2024
1 parent a4a8e2c commit fa0070f
Show file tree
Hide file tree
Showing 3 changed files with 413 additions and 94 deletions.
Loading

0 comments on commit fa0070f

Please sign in to comment.