-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix large list truncation #127
Fix large list truncation #127
Conversation
@browniebroke I've fixed the bug and updated the tests (to check for item truncation where applicable), but I'm having trouble with the cassettes - How should I solve this? Apologies for my lack of knowledge - to say I'm rusty with |
OK, so I think I've fixed the issue of not being able to overwrite existing cassettes (by first running I added an extra test to I feel a bit conflicted regarding having this username data within this repository - the Deezer website doesn't provide functionality to view this data, it seems to only be available via the API. Additionally, while there might be a reasonable expectation that Deezer editors might have their track/album/artist/playlist details extracted from the API (given their roles), the 100 fans that the API spits out haven't had an opportunity to consent to their IDs & usernames being stored in a random repository. Because of this, I propose that, before I commit & push the changes, I manually edit the |
Hey, thanks for taking some time on this. Yes, sounds reasonable to anonymise the data before pushing it. 👍 |
Brilliant, will push now |
Only follows "next" urls: - when they are present in the API response, - when the query is for "tracks", "fans", "albums", "artists" or "playlists, and - when a limit has not been specified in kwargs (Deezer seems to follow this limit anyway, even if it is over their default limit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, I'm just wondering if this feature could trigger a large amount of hidden API calls. Did you hit any issues with that?
deezer/client.py
Outdated
relation == "tracks" | ||
or relation == "fans" | ||
or relation == "albums" | ||
or relation == "artists" | ||
or relation == "playlists" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add these restrictions on specific relations? Could we could do it whenever there is a next
in the response, regardless of the relation being requested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember testing it without that and something going horribly wrong, but that was before and "limit" not in kwargs
got inserted. I'll do some messing around and see if I can repro the same issue again, otherwise I'll take them out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still can't remember exactly why I left that in, but I think it had something to do with endpoints such as (and possibly only) artist/<artist>/top
. I suppose I thought it was a different question whether artists' top songs should be listed beyond the API's default limit (without specifying a higher limit).
The benefits of following next
URLs for requesting (e.g.) playlist tracks is fairly clear-cut in my opinion, as - if you ask for the tracks of a playlist - you expect all the tracks from that playlist to be returned. It's a bit more unclear as to whether all of an artist's top 100 songs (the API maximum) should be returned by default when making that call.
EDIT: I suppose in that case I could change it to just specifically exclude relation == top
, but I didn't want to do that in case it breaks anything that is either (1) added to the API in the future or (2) this library may not have implemented yet, which also return next
URLs that perhaps shouldn't be followed by default.
deezer/client.py
Outdated
while "next" in new_json: | ||
response = self.session.get(new_json["next"]) | ||
new_json = response.json() | ||
json["data"].extend(new_json["data"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bypassing the error handling that we have above. What happens if we get an error in the middle? Could we get throttled by Deezer in this section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this. What do you suggest happens if we get throttled? My justification for fixing this in the first place was that "if you ask for <playlist>
's tracks, you want all the tracks", but what should happen if we get cut off halfway through (or whenever)?
I did think about that, but I'm not sure if there is a better (feasible) solution. I ran into this issue because I was requesting tracks from a large playlist and they weren't all appearing, which seemed contrary to the design of the library. The Deezer API's preferred way of doing things is seemingly to follow Just to say, when making changes to the tests I did originally have tests where 1000-1500 songs were requested and I don't remember hitting any rate limits (the API does have some but I can't find them formally documented, just this 2015 answer from someone who claims to be a Deezer developer) - I changed those tests to reduce the size of the cassettes. Arguably, if the API throws a rate limit for following its own |
To avoid duplicate code, I combined the requesting + error handling of the original API call, and, if applicable, any subsequent `next` API calls. However, the consequence of this is that the code probably makes no sense to the untrained eye. Hence, the absolute barrage of comments. (I promise you, the tests *do* pass!)
Sourcery Code Quality Report (beta)❌ Merging this PR will decrease code quality in the affected files by 0.04 out of 10.
Here are some functions in these files that still need a tune-up:
Please see our documentation here for details on how these metrics are calculated. We are actively working on this report - lots more documentation and extra metrics to come! Let us know what you think of it by mentioning @sourcery-ai in a comment. |
Just discovered while working on #134 that deezer-python/deezer/resources.py Lines 65 to 72 in 99f01b7
I'm still of the opinion that if you ask for the tracks in a playlist (in list form), you expect all the tracks from that playlist. But now the code changes basically render any memory usage reduction in Hmm. I'm a bit stumped now as to what's the best way to implement this. Any help from anyone would be greatly appreciated! |
I haven't got around to review this again properly yet, but yes, the Do you think we could stick to the We might want to switch the I recently used PyGithub and I liked how they handle pagination, it works out of the box and transparently goes through the pages while also letting you get a specific page or the total count of elements in the result set. |
Yes, that’s probably a good idea. I’ll close this PR for now (should I leave #126 open?)
|
I think it could be closed yes. I'll write an issue to rework the pagination, the fact that you missed the |
Opened #137 |
Fixes #126