Add option to allow all URLs to be crawlable via robots.txt #2107

acelaya · 2024-04-21T15:10:53Z

As discussed in #2067 (reply in thread), this PR adds an option to make Shlink return a robots.txt that allows all URLs to be crawlable, except rest ones.

The option is disabled by default, for backwards compatibility.

codecov · 2024-04-21T15:13:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.11%. Comparing base (986f116) to head (163244f).
Report is 4 commits behind head on develop.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #2107   +/-   ##
==========================================
  Coverage      96.10%   96.11%           
- Complexity      1423     1424    +1     
==========================================
  Files            263      263           
  Lines           5113     5116    +3     
==========================================
+ Hits            4914     4917    +3     
  Misses           199      199

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

dhow · 2024-05-24T01:36:58Z

@acelaya -- I was looking reading your change and tried adding env variable ROBOTS_ALLOW_ALL=TRUE and ROBOTS_ALLOW_ALL_SHORT_URLS=TRUE but I didn't see difference in generated robots.txt. Running 4.1.1 docker container. I might have missed something, but what is the expected behavior (in robots.txt)?

:~$ curl https://xxx/robots.txt # ROBOTS_ALLOW_ALL_SHORT_URLS=TRUE
User-agent: *
Disallow: /
:~$ curl https://xxx/robots.txt # ROBOTS_ALLOW_ALL_SHORT_URLS=FALSE
User-agent: *
Disallow: /
:~$ curl https://xxx/robots.txt # ROBOTS_ALLOW_ALL=TRUE
User-agent: *
Disallow: /
:~$ curl https://xxx/robots.txt # ROBOTS_ALLOW_ALL=FALSE
User-agent: *
Disallow: /

FYI, for Facebook link validation (unfortunately you need a Facebook account to try... 😄 ) I'm currently using following patch on /etc/shlink/module/Core/src/Action/RobotsAction.php :

--- RobotsAction.php    2024-04-14 16:13:41.000000000 +0900
+++ RobotsAction.php-modified   2024-04-19 21:11:37.891032753 +0900
@@ -33,6 +33,9 @@
         # For more information about the robots.txt standard, see:
         # https://www.robotstxt.org/orig.html
 
+        User-agent: facebookexternalhit
+        Disallow: 
+
         User-agent: *
 
         ROBOTS;

acelaya · 2024-05-24T05:53:11Z

This feature is not yet released. It will ship with v4.2.0

dhow · 2024-05-24T10:01:13Z

Roger @acelaya !!

acelaya force-pushed the feature/robots-allow-all branch from b9b6ec2 to c5d5b33 Compare April 22, 2024 07:07

Add option to allow all URLs to be crawlable via robots.txt

163244f

acelaya force-pushed the feature/robots-allow-all branch from c5d5b33 to 163244f Compare April 22, 2024 07:16

acelaya marked this pull request as ready for review April 22, 2024 07:17

acelaya merged commit 59fa088 into shlinkio:develop Apr 22, 2024
23 checks passed

acelaya deleted the feature/robots-allow-all branch April 22, 2024 07:23

acelaya mentioned this pull request Apr 22, 2024

Allow user agents to be customized in robots.txt #2109

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to allow all URLs to be crawlable via robots.txt #2107

Add option to allow all URLs to be crawlable via robots.txt #2107

acelaya commented Apr 21, 2024 •

edited

Loading

codecov bot commented Apr 21, 2024 •

edited

Loading

dhow commented May 24, 2024 •

edited

Loading

acelaya commented May 24, 2024

dhow commented May 24, 2024

Add option to allow all URLs to be crawlable via robots.txt #2107

Add option to allow all URLs to be crawlable via robots.txt #2107

Conversation

acelaya commented Apr 21, 2024 • edited Loading

codecov bot commented Apr 21, 2024 • edited Loading

Codecov Report

dhow commented May 24, 2024 • edited Loading

acelaya commented May 24, 2024

dhow commented May 24, 2024

acelaya commented Apr 21, 2024 •

edited

Loading

codecov bot commented Apr 21, 2024 •

edited

Loading

dhow commented May 24, 2024 •

edited

Loading