pagination: Add paging to output #1047

aeneasr · 2018-09-24T17:15:08Z

Right now pagination is supported but it's impossible to know how many items there are in total and where to query next. There are two ways of sharing this information:

Via HTTP-headers. This is what GitHub is doing by using the Link header. I think this is a brilliant idea and can greatly improve the API without actually breaking backwards compatibility.
Via response payload. An example would be { "next": "...", "prev": "..", "last": "...", "first": "..", items: [...] }. This is a breaking change as all list endpoints return an array at the moment.

Feel free to weigh in on this issue!

The text was updated successfully, but these errors were encountered:

retendo · 2018-09-24T18:21:59Z

Pros for 2:
There seems to be a security concern for returning plain arrays in responses:
https://haacked.com/archive/2008/11/20/anatomy-of-a-subtle-json-vulnerability.aspx/

Also, the response would be better suited for future extension, for example if you would want to return a totalCount or something like that next to the actual data.

aeneasr · 2018-09-24T21:10:20Z

This vulnerability has been fixed by all browsers a long time ago, even in IE 10. It was not possible for Array only (Array.prototype overloading) but objects too (with Object.prototype overloading) so both variants were „vulnerable“ - well, the browsers were.

…

On 24. Sep 2018, at 20:22, retendo ***@***.***> wrote: Pros for 2: There seems to be a security concern for returning plain arrays in responses: https://haacked.com/archive/2008/11/20/anatomy-of-a-subtle-json-vulnerability.aspx/ Also, the response would be better suited for future extension, for example if you would want to return a totalCount or something like that next to the actual data. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

aeneasr · 2018-09-25T07:51:01Z

I agree. I think the GitHub link header is a good direction. I will check if it is possible to include additional payloads (like count) without modifying the payload.

…

On 24. Sep 2018, at 23:23, Amir Aslaminejad ***@***.***> wrote: I opt for maintaining backwards compatibility in the response payload as some people may already plumbed together some form of administrative UI. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

condemil · 2018-09-25T11:45:12Z

I propose to consider pagination tokens instead of offsets, this way you prevent the case when some data will be added/deleted between requests and will be returned twice or some rows will be skipped in between. The idea is that you specify amount of items you want in request and you receive token back. By using this token you will get next amount of items. Here is more info: https://use-the-index-luke.com/no-offset

aeneasr · 2018-09-25T13:11:41Z

While the problem is definitively there, having a session-fixed cursor opens a different can of worms, especially because Hydra‘s IDs are currently user-defined and not ordered, so something like last_seen won’t work. I’ll take it into consideration but can’t promise it will make it to the final result. Also, the impact is minimal. There aren’t a ton of OAuth2 Clients around usually, neither are there a lot of JWKs. This may be an issue with sessions and tokens, but we’re not exposing list capabilities there.

…

On 25. Sep 2018, at 13:45, Dmitry ***@***.***> wrote: I propose to consider pagination tokens instead of offsets, this way you prevent the case when some data will be added/deleted between requests. The idea is that you specify amount of items you want in request and you receive token back. By using this token you will get next amount of items. Here is more info: https://use-the-index-luke.com/no-offset — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

someone1 · 2018-09-26T18:44:05Z

I like the token over offset idea since I use a NoSQL datastore and so offsets are expensive compared to tokens (in my case, cursors). Luckily, even without ordering there's an implicit order in queries to create a token off of - maybe there can be a way to do this with MySQL? Or maybe introduce an auto increment ID column for each table and make that the PK (adding UNIQUE and INDEX where applicable on other columns)?

aeneasr · 2018-09-26T18:47:01Z

Changing the PKs has been on my mind for a while. I always backed off from it because of the policy module but as that is no longer relevant in Hydra it would be a smart move pre-1.0 stable

condemil · 2018-09-27T11:39:49Z

Here is more info how to do that with SQL: https://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way

someone1 · 2018-10-02T13:25:31Z

Asking for clarification - only clients within hydra cannot be ordered and thus need some mechanism put in place if hydra uses token-based pagination, correct? Like a new auto-increment PK? Everything else can be sessioned by the creation/request date? Would it not also make sense to add a creation date to clients from an auditing perspective?

condemil · 2018-10-02T14:32:04Z

I think the indexed creation date is the way to go. Normally you expect the data paginated from newest to oldest (or alphabetically) in such cases.

aeneasr · 2018-12-07T12:44:20Z

We do have auto-increment PKs now which we will use as offset. This works really well with the memory adapter too as we'll just index the slice. I really like the GitHub approach as that will also not cause any BC breaks in the API:

Link: <https://hydra/clients?offset=123131&limit=10>; rel="next",
  <https://hydra/clients?offset=123121&limit=10>; rel="last"

aeneasr · 2018-12-07T12:44:31Z

Oh, and anyone interested in contributing?

someone1 · 2018-12-07T14:14:54Z

Ahhh - would it be possible to leave that as a string type in the code so I can shove my token in there? There's going to be some kind of int64 -> string and vice-versa conversion for it anyway.

aeneasr · 2018-12-07T14:22:48Z

Sure, we don't really care if a query param is string or int!

kminehart · 2019-04-08T14:47:36Z

Sorry, my original question is irrelevant after doing some looking around.

To account for @someone1's scenario, we would have to support using a string as the offset value, right?

In a paginated handler:

	limit, _ := pagination.Parse(r, 100, 0, 500)
	offset := r.URL.Query().Get("offset")

And to account for using the ID as the offset, like in @aeneasr's example, Link: <https://hydra/clients?offset=123131&limit=10>; rel="next", ..., the SELECT queries will have to change to this:

m.DB.SelectContext(ctx, &d, m.DB.Rebind("SELECT * FROM hydra_client ORDER BY id LIMIT ? WHERE id > ?")

Is that correct?

The way I see pagination implemented currently in Hydra is that it is entirely based on integers, and the LIMIT and OFFSET keywords. I don't believe that in Postgres or MySQL there is any guarantee that autoincrementing columns will always increment by 1, starting at 0, so relying on OFFSET for this is unlikely to work.

aeneasr · 2019-04-08T16:56:45Z

But https://hydra/clients?offset=123131&limit=10 is still using an integer, so why would we need a string here?

I don't believe that in Postgres or MySQL there is any guarantee that autoincrementing columns will always increment by 1, starting at 0, so relying on OFFSET for this is unlikely to work.

I don't get that point, why would OFFSET be linked to the serial/auto_increment PKs?

In general, the status quo of database adapters has changed since we had this discussion and it is now quite clear that NoSQL is not the right backend for this project. We've added several key constraints which become more and more difficult to maintain with NoSQL databases. Not having those key constraints in place removes several important security safeguards and features. I think that we can safely say now that ORY Hydra requires a relational storage adapter to function properly. Therefore, we can, IMO, make the statement that pagination has to work with SQL databases as a first, and everything else is nice to have. Since we're not dealing with humongous amounts of data and I think a good tradeoff between complexity and getting this feature shipped is - for now - simply using LIMIT/OFFSET.

Feel free to chip in if you think that shouldn't be the case.

kminehart · 2019-04-08T17:21:50Z

it was in response to this additional requirement:

from @someone1

Ahhh - would it be possible to leave that as a string type in the code so I can shove my token in there? There's going to be some kind of int64 -> string and vice-versa conversion for it anyway.

I assumed he was referring to the offset. In your example, offset=123121, is 123121 an ID or is it just "the 123121st result"?

Using OFFSET wouldn't really work if 123121 is an ID, and the request is essentially saying, "start at the element with an ID of 123121 and give me 10 items after". This was my original interpretation when I read the examples.

But I think your comment answers my overall question. :)

kminehart · 2019-04-12T16:40:03Z

since ory/x#36 was merged, this will probably be done tonight.

aeneasr added feat New feature or request. package/client package/jwk package/oauth2 package/consent labels Sep 24, 2018

aeneasr added this to the v1.0.0-rc.1 milestone Sep 24, 2018

aeneasr mentioned this issue Sep 27, 2018

sql: Add auto-increment PKs #1059

Closed

someone1 mentioned this issue Oct 24, 2018

client: Track when clients are created #1120

Closed

aeneasr modified the milestones: v1.0.0-rc.1, v1.0.0 Nov 8, 2018

kminehart mentioned this issue Apr 10, 2019

add pagination functions for use in ory/hydra ory/x#36

Merged

5 tasks

aeneasr modified the milestones: v1.0.0, v1.0.1 Apr 11, 2019

kminehart mentioned this issue Apr 14, 2019

Pagination headers #1358

Merged

6 tasks

aeneasr closed this as completed in f1ee77c Apr 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pagination: Add paging to output #1047

pagination: Add paging to output #1047

aeneasr commented Sep 24, 2018

retendo commented Sep 24, 2018

aeneasr commented Sep 24, 2018 via email

aeneasr commented Sep 25, 2018 via email

condemil commented Sep 25, 2018 •

edited

Loading

aeneasr commented Sep 25, 2018 via email

someone1 commented Sep 26, 2018

aeneasr commented Sep 26, 2018

condemil commented Sep 27, 2018

someone1 commented Oct 2, 2018

condemil commented Oct 2, 2018

aeneasr commented Dec 7, 2018

aeneasr commented Dec 7, 2018

someone1 commented Dec 7, 2018

aeneasr commented Dec 7, 2018

kminehart commented Apr 8, 2019 •

edited

Loading

aeneasr commented Apr 8, 2019

kminehart commented Apr 8, 2019 •

edited

Loading

kminehart commented Apr 12, 2019

pagination: Add paging to output #1047

pagination: Add paging to output #1047

Comments

aeneasr commented Sep 24, 2018

retendo commented Sep 24, 2018

aeneasr commented Sep 24, 2018 via email

aeneasr commented Sep 25, 2018 via email

condemil commented Sep 25, 2018 • edited Loading

aeneasr commented Sep 25, 2018 via email

someone1 commented Sep 26, 2018

aeneasr commented Sep 26, 2018

condemil commented Sep 27, 2018

someone1 commented Oct 2, 2018

condemil commented Oct 2, 2018

aeneasr commented Dec 7, 2018

aeneasr commented Dec 7, 2018

someone1 commented Dec 7, 2018

aeneasr commented Dec 7, 2018

kminehart commented Apr 8, 2019 • edited Loading

aeneasr commented Apr 8, 2019

kminehart commented Apr 8, 2019 • edited Loading

kminehart commented Apr 12, 2019

condemil commented Sep 25, 2018 •

edited

Loading

kminehart commented Apr 8, 2019 •

edited

Loading

kminehart commented Apr 8, 2019 •

edited

Loading