Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pagination: Add paging to output #1047

Closed
aeneasr opened this issue Sep 24, 2018 · 18 comments
Closed

pagination: Add paging to output #1047

aeneasr opened this issue Sep 24, 2018 · 18 comments

Comments

@aeneasr
Copy link
Member

aeneasr commented Sep 24, 2018

Right now pagination is supported but it's impossible to know how many items there are in total and where to query next. There are two ways of sharing this information:

  1. Via HTTP-headers. This is what GitHub is doing by using the Link header. I think this is a brilliant idea and can greatly improve the API without actually breaking backwards compatibility.
  2. Via response payload. An example would be { "next": "...", "prev": "..", "last": "...", "first": "..", items: [...] }. This is a breaking change as all list endpoints return an array at the moment.

Feel free to weigh in on this issue!

@retendo
Copy link

retendo commented Sep 24, 2018

Pros for 2:
There seems to be a security concern for returning plain arrays in responses:
https://haacked.com/archive/2008/11/20/anatomy-of-a-subtle-json-vulnerability.aspx/

Also, the response would be better suited for future extension, for example if you would want to return a totalCount or something like that next to the actual data.

@aeneasr
Copy link
Member Author

aeneasr commented Sep 24, 2018 via email

@aeneasr
Copy link
Member Author

aeneasr commented Sep 25, 2018 via email

@condemil
Copy link
Contributor

condemil commented Sep 25, 2018

I propose to consider pagination tokens instead of offsets, this way you prevent the case when some data will be added/deleted between requests and will be returned twice or some rows will be skipped in between. The idea is that you specify amount of items you want in request and you receive token back. By using this token you will get next amount of items. Here is more info: https://use-the-index-luke.com/no-offset

@aeneasr
Copy link
Member Author

aeneasr commented Sep 25, 2018 via email

@someone1
Copy link
Contributor

I like the token over offset idea since I use a NoSQL datastore and so offsets are expensive compared to tokens (in my case, cursors). Luckily, even without ordering there's an implicit order in queries to create a token off of - maybe there can be a way to do this with MySQL? Or maybe introduce an auto increment ID column for each table and make that the PK (adding UNIQUE and INDEX where applicable on other columns)?

@aeneasr
Copy link
Member Author

aeneasr commented Sep 26, 2018

Changing the PKs has been on my mind for a while. I always backed off from it because of the policy module but as that is no longer relevant in Hydra it would be a smart move pre-1.0 stable

@condemil
Copy link
Contributor

Here is more info how to do that with SQL: https://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way

@someone1
Copy link
Contributor

someone1 commented Oct 2, 2018

Asking for clarification - only clients within hydra cannot be ordered and thus need some mechanism put in place if hydra uses token-based pagination, correct? Like a new auto-increment PK? Everything else can be sessioned by the creation/request date? Would it not also make sense to add a creation date to clients from an auditing perspective?

@condemil
Copy link
Contributor

condemil commented Oct 2, 2018

I think the indexed creation date is the way to go. Normally you expect the data paginated from newest to oldest (or alphabetically) in such cases.

@aeneasr
Copy link
Member Author

aeneasr commented Dec 7, 2018

We do have auto-increment PKs now which we will use as offset. This works really well with the memory adapter too as we'll just index the slice. I really like the GitHub approach as that will also not cause any BC breaks in the API:

Link: <https://hydra/clients?offset=123131&limit=10>; rel="next",
  <https://hydra/clients?offset=123121&limit=10>; rel="last"

@aeneasr
Copy link
Member Author

aeneasr commented Dec 7, 2018

Oh, and anyone interested in contributing?

@someone1
Copy link
Contributor

someone1 commented Dec 7, 2018

Ahhh - would it be possible to leave that as a string type in the code so I can shove my token in there? There's going to be some kind of int64 -> string and vice-versa conversion for it anyway.

@aeneasr
Copy link
Member Author

aeneasr commented Dec 7, 2018

Sure, we don't really care if a query param is string or int!

@kminehart
Copy link
Contributor

kminehart commented Apr 8, 2019

Sorry, my original question is irrelevant after doing some looking around.


To account for @someone1's scenario, we would have to support using a string as the offset value, right?

In a paginated handler:

	limit, _ := pagination.Parse(r, 100, 0, 500)
	offset := r.URL.Query().Get("offset")

And to account for using the ID as the offset, like in @aeneasr's example, Link: <https://hydra/clients?offset=123131&limit=10>; rel="next", ..., the SELECT queries will have to change to this:

m.DB.SelectContext(ctx, &d, m.DB.Rebind("SELECT * FROM hydra_client ORDER BY id LIMIT ? WHERE id > ?")

Is that correct?

The way I see pagination implemented currently in Hydra is that it is entirely based on integers, and the LIMIT and OFFSET keywords. I don't believe that in Postgres or MySQL there is any guarantee that autoincrementing columns will always increment by 1, starting at 0, so relying on OFFSET for this is unlikely to work.

@aeneasr
Copy link
Member Author

aeneasr commented Apr 8, 2019

But https://hydra/clients?offset=123131&limit=10 is still using an integer, so why would we need a string here?

I don't believe that in Postgres or MySQL there is any guarantee that autoincrementing columns will always increment by 1, starting at 0, so relying on OFFSET for this is unlikely to work.

I don't get that point, why would OFFSET be linked to the serial/auto_increment PKs?

In general, the status quo of database adapters has changed since we had this discussion and it is now quite clear that NoSQL is not the right backend for this project. We've added several key constraints which become more and more difficult to maintain with NoSQL databases. Not having those key constraints in place removes several important security safeguards and features. I think that we can safely say now that ORY Hydra requires a relational storage adapter to function properly. Therefore, we can, IMO, make the statement that pagination has to work with SQL databases as a first, and everything else is nice to have. Since we're not dealing with humongous amounts of data and I think a good tradeoff between complexity and getting this feature shipped is - for now - simply using LIMIT/OFFSET.

Feel free to chip in if you think that shouldn't be the case.

@kminehart
Copy link
Contributor

kminehart commented Apr 8, 2019

it was in response to this additional requirement:

from @someone1

Ahhh - would it be possible to leave that as a string type in the code so I can shove my token in there? There's going to be some kind of int64 -> string and vice-versa conversion for it anyway.

I assumed he was referring to the offset. In your example, offset=123121, is 123121 an ID or is it just "the 123121st result"?

Using OFFSET wouldn't really work if 123121 is an ID, and the request is essentially saying, "start at the element with an ID of 123121 and give me 10 items after". This was my original interpretation when I read the examples.

But I think your comment answers my overall question. :)

@kminehart
Copy link
Contributor

since ory/x#36 was merged, this will probably be done tonight.

@kminehart kminehart mentioned this issue Apr 14, 2019
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants