Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add user agent string to feed requests #3099

Merged
merged 2 commits into from
Jun 24, 2024
Merged

Conversation

mattbasta
Copy link
Contributor

Hey folks, I work on Pinecast (https://pinecast.com). We recently had a customer write in that their audiobookshelf server stopped updating feeds from us. It turns out that we'd seen a huge uptick in requests from the default Axios user agent string, which caused requests with that user agent to get served a captcha. My suspicion is that companies have started scraping more aggressively.

Podcast apps have (by and large) done a good job of providing descriptive User-Agent headers. Identifying your app allows podcasters to get informed analytics data and helps hosting companies like mine to identify and fix problems more easily.


In this PR, I'm adding the user-agent HTTP request header. I've chosen a value that is fairly straightforward: the name of the project, a link to the project, and like iTMS to indicate that the client should be treated similarly to Apple's podcast crawler. I do not have strong feelings about this UA string, but I'll suggest that if you decide to go with something else, you avoid adding a version number. There's a balance between user privacy (less information) and transparency (more descriptive information) and compatibility (more information that looks like another user agent).

This PR does not update the client calls to download media. I am not too familiar with the project and do not want to regress any features. However, I strongly suspect this function will need a similar update:

https://github.com/advplyr/audiobookshelf/blob/master/server/utils/fileUtils.js#L254

I'm glad to make that change as well, if you think it's appropriate.

@advplyr
Copy link
Owner

advplyr commented Jun 23, 2024

Thanks! The fileUtils downloadFile is used for downloading podcast episodes, podcast cover images and audiobook cover images.
I'm not sure about the like iTMS part, is that specific to Pinecast and what would the difference be if that wasn't included?

@mattbasta
Copy link
Contributor Author

It's not specific to Pinecast, no. iTMS is the UA string that Apple uses when crawling feeds and episodes. Many UAs follow that pattern (you'll see "like Gecko" for things that want to advertise compatibility with Firefox or "like FeedFetcher-Google" for things that pull RSS).

Like I said, I don't have strong feelings either way, but my intuition is that it'll help in cases where folks are doing naive UA detection. I'm glad to update it to your liking, and also update the second function.

@advplyr
Copy link
Owner

advplyr commented Jun 24, 2024

If like iTMS was included in the request for the RSS feed but was not included in the request for the files would that make sense? I'm not sure if it is better to be consistent across all requests.

@mattbasta
Copy link
Contributor Author

It could be excluded for the asset requests. That's sensible

server/utils/ffmpegHelpers.js Dismissed Show dismissed Hide dismissed
server/utils/ffmpegHelpers.js Dismissed Show dismissed Hide dismissed
@advplyr
Copy link
Owner

advplyr commented Jun 24, 2024

I updated the site to https://audiobookshelf.org instead of the repo because the repo is going to move to the audiobookshelf org eventually. Thanks for the help!

@advplyr advplyr merged commit 04a6564 into advplyr:master Jun 24, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants