Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENTS] Robots.txt file option or crawler discouragment or static file responses in virtual directories #431

Open
DerLeole opened this issue Dec 17, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@DerLeole
Copy link

Is your feature request related to a problem? Please describe.
There is currently no easy way to inject a robots.txt file or similar under a subpath or virtual directory and many services do not have any preconfigured, risking the listing on search engines and the like.
NPM offers this functionality with about 4 lines of custom code in its UI response editor and it would be cool to mirror something like that in Zoraxy.

Describe the solution you'd like
There are multiple ways to enable this, and multiple of them could co-exist:

  • A simple option on proxy creation like "Discourage Crawlers" that injects a simple response when the /robots.txt (or an equal) path is requested.
    • Could also check if the upstream source responds with its own and only inject if there is no response from upstream.
  • A similar but more configurable option in the Access Control Panel
  • New functionality in the virtual directories feature that allow user to configure a simple static plaintext or html response right in the editor, without having to setup any external services.
  • Fully functional scripting functionality similar to how NPM allows users to inject Nginx script directly.
    • Im fully aware that this is probably not something that is planned currently and I just threw it in here as a consideration.

Describe alternatives you've considered
Setup a static webserver externally (or internally) that serves an appropriate robots.txt file and manually add it to each host as a virtual directory. But this is super annoying and slow.

@DerLeole DerLeole added the enhancement New feature or request label Dec 17, 2024
@tobychui
Copy link
Owner

@DerLeole I am not sure why so many people get the virtual directories wrong, it is not intended to handle traffics for requests other than static resources. Vdirs are designed so multiple sites of yours can share the same resources folder, so you do not need to, for example, have multiple copies of the same website opengraph image file across multiple web servers. I guess Nginx configuration's design somehow cause the illusion of "all endpoints are virtual directories", but I understand why they do it like that due to architectural reasons. Apache web server design is actually more technically correct in terms of how Vdir are supposed to work.

If you need robot.txt, you should probably do it in your domain (or subdomain) root (aka the / of your domain, like example.com instead of example.com/api). If that is the case you do not need Vdir (so your question is invalid). Assuming you are asking this because you want to use one robot.txt across multiple sites of yours (which is not an ideal way of doing it but it is your homelab so it is up to you), crawlers respect redirects so you can simply put one copy in one of your server and setup redirection rule to point that copies of robot.txt in your other site's hostname.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants