Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rest_api source] can't detect pagination for github api and poke api #1915

Open
AstrakhantsevaAA opened this issue Oct 2, 2024 · 2 comments
Assignees

Comments

@AstrakhantsevaAA
Copy link
Contributor

dlt version

1.1.0

Describe the problem

The rest_api source cannot autodetect pagination for github api and poke api, this impacts our tutorial.

If you run this pipeline, you will get a list of Fallback paginator warnings for both: github api and poke api.

This fallback also causes a rate limiting error, the rest_api source continually requests github api until the error occurs.

Expected behavior

according to our tutorial, rest_api source should automatically detect such simple types of pagination.

Steps to reproduce

  1. run
dlt init rest_api duckdb
  1. run
python rest_api_pipeline.py

Operating system

macOS

Runtime environment

Local

Python version

3.11

dlt data source

No response

dlt destination

No response

Other deployment details

No response

Additional information

No response

@burnash
Copy link
Collaborator

burnash commented Oct 2, 2024

Thank your for the issue @AstrakhantsevaAA, I believe the rest_api detects the paginator successfully. If I'm not mistaken the message is related to "child" resources (single page) where there's no pagination present. In this case paginator uses SinglePagePaginator.
Do you see any data loaded when you running the pipelines?

@burnash burnash self-assigned this Oct 2, 2024
@AstrakhantsevaAA
Copy link
Contributor Author

@burnash yeah, I think you are right, it's not clear from the warning message. Anyway this part of tutorial should be adjusted, by default we can't run this pipeline, because of rate limits, I think we can reduce the amount of data for issues endpoints:

"initial_value": pendulum.today().subtract(days=**7**).to_iso8601_string(),

And these warning scares our new users :D can we log this warning in the beginning not for each request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants