-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to how Redgifs temp auth tokens are handled #940
Changes to how Redgifs temp auth tokens are handled #940
Conversation
With respect to maintaining statelessness, I think a good compromise could be to instead cache the token in memory. Yes, bdfr would still be fetching a new token on each startup, but it would still result in far fewer requests than it is currently making fetching the token every time. |
I did consider caching the It would be a cleaner solution though. It would be good to look into. This seems to work quite decently though. I've tested downloads on a few other accounts and it seemed to prevent the rate limiting. Still not the biggest fan of my implementation though, since it skips the link that throws the |
Makes sense. I can confirm these changes also work on my machine (Lastest MacOS, Python 3.9). I was able to download about ~10 profiles in a row with many redgifs links when previously a single profile was failing. |
redgifs.Redgifs._get_token(redgifs.Redgifs, redgifs_url) | ||
|
||
# Attempted redownload of link that hit the 401 error. Don't think it works though. | ||
redgifs.Redgifs._get_link(redgifs_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the line that attempts to re-run the download with the skipped link from the 401
error.
I'm guessing the chain resolves somewhere else and the link doesn't get passed correctly.
Might be worth moving the entire error checking code into redgifs.py
and just handle it over there.
Would be easier to pass the url and re-run the download function.
Glad it works! I was a bit worried about the temp directory stuff, but I assumed that python's You can replicate the If we could figure out a way to pass that link back over and run it again with the new token, there would be no downsides to this change. |
Would you prefer me to make comments on things to change or pull and implement those changes myself? |
This appears to be based off the master branch? The dev branch should cache the auth tokens already unless things have changed since then. I can't say for sure though as my major usage has become almost nil since the imgur and reddit changes. |
As has mine. @remghoost can you confirm if this is still a problem with the development code? |
@Serene-Arc You can pull and edit the changes if you want. That's fine by me. I haven't tested it the changes on the And if it's true that the I wasn't exactly aware of the state of the repo when I started working on the project. |
All good. Note that when you pull the development branch, your own tests will work if you provide an updated token and secret from Reddit. No doubt there are some tests that will fail anyway (such is the life of testing a web scraper) but you'll get an idea of whether your changes are needed. |
tl;dr - I'll close this pull request since it was already fixed in the So after looking through the It caches the token for 22.8 hours, which falls in line with the 24 hour limit on tokens. The main difference is that it caches the token in memory instead of a file, which has its pros and cons. I like my solution a bit more, but the I'll close this pull request since the problem was technically not with the script, but with the inability to update the pip package. Also, still pick up an error on the
I fixed it on my end by changing Not sure why that's happening (as it doesn't seem to happen when installed via |
I was actually going to suggest that you change to use a caching and memoisation method instead of a temp file so it worked out fine. |
Hello.
I'm fairly new to programming (only about a year of python so far).
I'm not entirely comfortable with pull requests yet and I apologize if some aspects are incorrect.
Should I be attempting to merge into the
development
branch....?Constructive criticism is appreciated.
I also apologize that my request is wordy.
I prefer being thorough.
This pull request closes #939.
Foreward
I have read both
ARCHITECTURE.md
andCONTRIBUTING.md
.I believe this pull request fits in line with both of them, with one caveat.
This line declares that
bdfr
should be a stateless and my pull request technically is not.https://github.com/aliparlakci/bulk-downloader-for-reddit/blob/7676ff06a3b0c31eb2af7e691412cd63e862deea/docs/ARCHITECTURE.md?plain=1#L7
Currently, temporary auth tokens are pulled from the Redgifs API every time a link is checked.
https://github.com/aliparlakci/bulk-downloader-for-reddit/blob/7676ff06a3b0c31eb2af7e691412cd63e862deea/bdfr/site_downloaders/redgifs.py#L41-L43
I'm not sure when Redgifs changed their rate limiting (as downloads were working fine a few weeks ago), but they did.
My changes cache the temporary token using
os.path.join(tempfile.gettempdir(), "redgifs_token.txt")
and retrieve/save thetoken
in three different instances.redgifs_token.txt
does not exist.401
error occurs, meaning the token is invalid.redgifs_token.txt
exists and is a workingauth_token
.These tokens are good for 24 hours, as stated in the Redgifs API docs,
Primary changes
I will give bullet points on what I changed in each script.
I'm apologize, for some reason I cannot comment on specific lines below. Still learning.
bdfr/__main__.py
__version__
import. I received an error related to being unable to find__version__
when attempting to usepip install -e .
, so the import was adjusted. I can remove this if it's not a problem on your end. This was just required to get development working on my end.bdfr/downloader.py
redgifs
viabdfr.site_downloaders
. I don't like doing this, but I couldn't figure out a way to only import/grabredgifs
when necessary. You'd probably have a better solution to this.Redgifs
related401 errors
. I'm usingstr(e).startswith("Server responded with 401")
. It's lame, but it was yelling at me when I triede.status_code
. No clue why. This works though.bdfr/site_downloaders/base_downloader.py
429
errors. It printsToo many requests, Try again in a little while
bdfr/site_downloaders/redgifs.py
os
andtempfile
, both necessary for creating/readingredgifs_token.txt
.TOKEN_FILE_PATH
to take care of caching theauth_token
._load_token()
,_save_token()
, and_get_token()
. These handle pretty much exactly what their names state. I tried to cut out the fluff and only include what was necessary. I'm sure more could be trimmed off though.Expected output, tests, and considerations
Expected output
If
redgifs_token.txt
is not found or a401
error is received,redgifs.py
will create the file and save thetoken
to it.I added formatting via print statements to let the user know what was going on.
You could remove the
print
statements if you'd like but I think they look nice....Redgifs temporary auth tokens are anonymous, read only, and only valid for 24 hours. I don't think there's any issue saving/displaying them. Though, I could be mistaken on that...?
I could remove the print statement that displays the
token
if you're not comfortable with it.I made them compact in the code as well, so they're not too intrusive.
-=-
If
redgifs_token.txt
is found, but is incorrect (giving a401
error code), the output is similar.BUT, the first attempted download does not go through.
I could not figure out how to fix this. If you could, I would greatly appreciate it.
This is the only bug that I could find.
This does not crash the script.
-=-
There is no output if the
token
is correct.Tests
Command:
My changes failed on
27
tests, but a freshgit clone
of the main branch failed on all of these tests as well.Many of the failures take place due to
404
errors with the provided links.Considerations
The Redgifs test fails as well, but due to the bug mentioned above.
One link will hit a
401
error before it notices and grabs thetoken
.I'm not sure how to adjust for that.
Once per day, the first download will fail.
But the current implementation is entirely broken due to the API changes.
The lesser of two evils, I suppose...? It could probably be fixed, but I am currently out of brain juice. lol.
This was only tested on a Windows 10 machine. I'm not sure how Linux/Mac would handle the
os.path.join(tempfile.gettempdir(), "redgifs_token.txt")
. It should work though.I could also put in code to remove the
redgifs_token.txt
after use (not entirely sure how though), if you wanted it to be truly stateless.