Nearly all web APIs get paging wrong(vermorel.com) |
Nearly all web APIs get paging wrong(vermorel.com) |
Without state, a token can be a bookmark into a predefined ordered dataset. That's more reliable than an offset, but just as inflexible, and much more expensive on the server side.
Or, I'm missing something too.
A simple method would be to take the state information the server needs in order to continue the enumeration (e.g. sorting order, how far along it was in the enumeration, etc.), JSON-encode it, encrypt&sign it, and then base64 encode it.
Return that token to the client, and if the client wants more data it can pass that token back to the server, which can decode it into all the information it needs to resume the enumeration.
A relatively simple approach that involves server side state is to periodically (once a minute?) generate the list of (for example) the 10000 top items.
(A high traffic site will most likely want to do this in any case, so that it has a cached list of items ready to serve to clients, instead of issuing a database query to find the top items for every request.)
Now, instead of overwriting the list of top items every time you regenerate it, keep multiple versions of the list. Then you can make the link to the next page specify the version of the list and the page number. That way, users will browse through one specific version of the list.
(This requires storing some state on the server, but the amount is relatively small. You control both the size of the generated list, how often new lists are generated and how long they are kept, so there is an easily calculated upper bound on the amount of state information you need store.)
I can see that the other comments on constructing continuation tokens won't work for HN assuming post upvotes are mutably updated.
This isn't what the author recommends, but I think this is a good approach.
You want to store enough information in the token that you can easily reconstruct and resume the enumeration.
For example, let us say that the user asked for all comments with a score >= 5 sorted by post time. In that case you could return 100 comments, and a token that encoded something like:
{
"min_score": 5,
"sort": "post_time",
"resume_from_post_time": "2015-05-07T05:34:02Z",
}
To ensure that it is easy to resume the enumeration, the API can fudge the number of returned items so that the returned data always breaks at a nice "post_time" boundary. The goal here is to make it easy for the client to get all the data in the enumeration without implementing all this logic themselves.True, it will only work efficiently for some types of queries, but a lot of the common queries can be reworked into something like that.
You suggest that the continuation token, is basically an encoding of the query parameters, to fetch results from the API. If you go with this approach, then you don't have to store any state on the server. This is a good approach, because it's simple, but it doesn't solve the issue, where the response from the API changes while you making the paging API calls. The example used in the article is where an order was deleted, while you were calling the API.
I was thinking of using a uuid to generate a continuation token, and storing a copy of the results from the API. Subsequent calls that use the continuation token, would take a subset of these results. This requires storing more state on the server, and managing that state. The benefit to this approach is that results you get back from paging are consistent. This solves the issue, where the results from the API change while you are calling the API multiple times. The downside to this approach is that you have to store more state in server. If you are storing the full results for all of these paging API calls, then this could be quite large.
The only case I can currently think of that cannot be solved using continuation tokens sent to the client is where the order of the items that are enumerated may change between calls to the API. For example, imagine that you are fetching items sorted by score, and somebody upvotes or downvotes an item while you are enumerating them. In that case it is very difficult to encode enough information in the continuation token. (I can think of complicated ways to make it work, but the resulting database queries would be horrible.)
But for simple stuff like deleted items and similar, it is easy. If you leave out sorting it is trivial to implement as well -- all you need is to enumerate the items based on an internal ID that is guaranteed to always increase, filtering them as required. The continuation token will simply be the ID of the last item you evaluated and the filter that is applied. On the next request you just resume from that ID. If that item ID happens to be deleted in the meantime it is no problem. You just resume from the next one available. I.e.:
SELECT * FROM items WHERE id > :last_returned_id AND [insert-filter-here] ORDER BY id LIMIT 100;If the client stores the state on behalf of the server, the server will potentially be working with a modified dataset on the next request, and we're right back where we started with limit and offset.
You can cook up a scheme to make it work for a restricted range of requests, but the compromises are severe.