Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.0] Implements URL routing (i.e. better queryless/pretty URLs) #8421

Draft
wants to merge 36 commits into
base: release-3.0
Choose a base branch
from

Conversation

Sesquipedalian
Copy link
Member

@Sesquipedalian Sesquipedalian commented Jan 24, 2025

The main thing this does is to enable SMF native support for URLs to be rewritten as routes (i.e. virtual paths that are interpreted as well-structured queries). In other words, it implements a robust form of pretty URLs.

Examples:

# Standard URL Routed URL
1 https://example.com/index.php?board=1.0 https://example.com/boards/general-discussion-1
2 https://example.com/index.php?board=1.20 https://example.com/boards/general-discussion-1/20
3 https://example.com/index.php?topic=123.15 https://example.com/topics/my-great-topic-123/15
4 https://example.com/index.php?action=moderate;area=modlog https://example.com/moderate/modlog
5 https://example.com/index.php?action=calendar;year=2025;month=1;day=23 https://example.com/calendar/2025/01/23/
6 https://example.com/index.php?action=post;board=1.0 https://example.com/boards/general-discussion-1/post
7 https://example.com/index.php?action=markasread;sa=board;board=1.0 https://example.com/boards/general-discussion-1/markasread/
8 https://example.com/index.php?action=profile;area=account;u=1 https://example.com/members/sesquipedalian-1/account

There are four types of routes:

  1. Routes to boards and topics, which use the form /boards/<slug>-<id> or /topics/<slug>-<id>, with an optional /<start> value appended for pagination purposes. See examples 1 through 3 above. There are also routes to individual posts using /msgs/<id>, but these redirect to the canonical URL just like ?msg=<id> does.
  2. Routes to typical actions, which use the form /<action>/<area>/<sa> (with the area and sub-action being optional). Example 4 above shows this. Some actions (see example 5) provide additional routing elements for commonly used query parameters that are specific to those particular actions.
  3. Routes to actions that apply to a specific board or topic, which use the form /boards/<slug>-<id>/<action>. See examples 6 and 7 above.
  4. Routes to members, which use the form /members/<slug>-<id>/<area>/<sa> (with the area and sub-action being optional). See example 8 above.

Internally, routes are translated into query strings very early in the process (specifically, during QueryString::cleanRequest()), so that all other code works with the query string just like it always has. Similarly, forum URLs are rewritten as routes only near the very end of the process, during Utils::obExit(), Utils::redirectexit(), or Mail::send().


There are two settings that control this behaviour:

  1. The existing queryless_urls setting. This is the primary setting. As has always been the case, it is only supported on the Apache, LiteSpeed, and lighttpd web servers.
  2. A new hide_index_php setting. When this is enabled, /index.phpwill be removed from URLs pointing to pages within the forum. This setting is supported on any web server.

These two settings can operate independently. If queryless URLs are enabled but the option to hide index.php is disabled, then https://example.com/index.php?board=1.0 will become https://example.com/index.php/boards/general-discussion-1. If the option to hide index.php is enabled but queryless URLs are disabled, thenhttps://example.com/index.php?board=1.0 will become https://example.com/?board=1.0.

If the admin enables both of these settings together, SMF writes a small mod_rewrite rule to the .htaccess file in order to make everything work correctly. This write operation performs safety checks; if the write operation fails, SMF will show an error message and will refuse to enable the settings.


Backward compatibility for the old-school queryless URLs has been maintained. Those forms of queryless URLs will no longer be generated, but they will still be recognized and parsed. This ensures that existing links from external sites, browser bookmarks, etc., will continue to work.


Slugs are generated on the fly for boards, topics, and members. Strictly speaking, these slugs are just fluff and don't matter for the functioning of the system. (Indeed, the slug can be left out or changed to any random string; all we need is the ID value that appears at the end). However, including a memorable slug in the URL allows the user to start typing the part of the URL that they remember into their browser's address bar and then have autocomplete suggest the remainder of the URL (including the harder-to-remember ID value) for them.

EDTIT: Based on @live627's feedback, we now redirect to the correct URL if an incorrect slug is given.


@VBGAMER45 will likely be particularly interested in the new integrate_rewrite_as_queryless and integrate_parse_route hooks. With these hooks, you should be able to create a hooks-only version of your Pretty URLs mod in order to support people who have used your mod with SMF 2.1 and below and/or who just prefer the format of the URLs that your mod produces.

Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
@live627
Copy link
Contributor

live627 commented Jan 24, 2025

Backward compatibility for the old-school queryless URLs has been maintained. Those forms of queryless URLs will no longer be generated, but they will still be recognized and parsed. This ensures that existing links from external sites, browser bookmarks, etc., will continue to work.

Are these permanent redirects? I read somewhere that some browsers can silently update the bookmark if it targets a permanent redirect.

Strictly speaking, these slugs are just fluff and don't matter for the functioning of the system. (Indeed, the slug can be left out or changed to any random string

Slugs cannot be arbitrary because users with a vendetta against the site will seed bad words into the address bar. I remember over a decade ago, a story broke where users would do this to news site and it was a big story because of f-bombs in the address bar that led to the article.

Don't let example.com/member/nitwitt-1 point to Admin.

Signed-off-by: Jon Stovell <jonstovell@gmail.com>
@Sesquipedalian
Copy link
Member Author

Sesquipedalian commented Jan 24, 2025

Missing character transliteration: /boards/новыи-раздел-3 => /boards/noviy-razdel-3. You could use ready-made libraries like behat/transliterator or symfony/string:

$slug = \Behat\Transliterator\Transliterator::transliterate($string, $this->spaceChar);

// or

$slug = (string) (new \Symfony\Component\String\Slugger\AsciiSlugger())->slug($string, $this->spaceChar)->lower();

The current behaviour is intentional. There's no reason not to support non-ASCII characters in URLs in 2025. In fact, I was considering the idea of localizing the boilerplate parts of the route, too. For example, on a forum that uses Russian as its default language the route would be something like /разделы/новыи-раздел-3. (I don't speak or read Russian, so that might have been the wrong word or grammatically incorrect, but I'm sure you get the idea.) That would require adding more language strings, of course.

To be clear, I would very much like feedback from @dragomano, @live627, @jdarwood007, @sbulen, @BrickOzp, @LexArma, @Kindred-999, @MissAllSunday, @Oldiesmann, @MissAllSunday, @Arantor, or anyone else who wants to comment on this matter. I'm open to being persuaded to transliterate all slugs to ASCII, if given good arguments for it. I'm also open to considering other ideas about this, if someone wants to suggest something else.

Sources/ActionRouter.php Outdated Show resolved Hide resolved
@Oldiesmann
Copy link
Contributor

The only issue I can see is that browsers will paste the URLs with HTML encoding even if displayed normally. See for instance the main page of the Russian version of Wikipedia, where "Заглавная_страница"
becomes this when pasted:

%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0

I'm not sure how that will matter for international users though.

Sesquipedalian and others added 2 commits January 24, 2025 01:21
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
Co-authored-by: John Rayes <live627@gmail.com>
@dragomano
Copy link
Contributor

Users prefer to transliterate URLs rather than use national characters in URLs for several reasons:

  • Compatibility: Not all browsers and systems handle URLs with national characters correctly, which can lead to access issues.
  • Convenience: Transliteration helps avoid problems with keyboard layout, especially if the user doesn't know how to switch to the desired language.
  • Memorability: Addresses written in Latin characters can be easier to remember and type, especially for users who do not speak the language with national characters.
  • International Access: Transliterated addresses may be more understandable for an international audience that may not know how to read or input characters from another alphabet.
  • Historical Habits: Many users are accustomed to using Latin characters on the internet, and this has become a standard for many web resources.

Yes, there are “.рф” domains, those addresses are completely in Cyrillic. But many still hate such URLs and prefer to use solutions like the Pretty URLs mod.

Here’s an example with Japanese characters. Do you want such URLs on forums?

https://www.simplemachines.org/あ-う-ん-な-い-そ-ん-に-ち-は-ひ-が-て-ら-し-い

@Sesquipedalian
Copy link
Member Author

The only issue I can see is that browsers will paste the URLs with HTML encoding even if displayed normally. See for instance the main page of the Russian version of Wikipedia, where "Заглавная_страница" becomes this when pasted:

%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0

I'm not sure how that will matter for international users though.

That seems to depends on the browser. In Safari, for example, copying the URL directly from address bar copies the Unicode characters, whereas in Firefox the percent-encoded values are copied.

Based on Wikipedia's example, though, I'm now wondering whether I should actually be less aggressive in stripping out diacritical marks (accents and such). They leave the all the diacritics in their URLs.

@LexArma
Copy link
Member

LexArma commented Jan 24, 2025

I do like the idea of optional built in pretty-urls functionality. Not sure if localization is worth the effort though, but I'll keep an eye on this discussion. I'm not for or against.

Sources/Utils.php Outdated Show resolved Hide resolved
Signed-off-by: Jon Stovell <jonstovell@gmail.com>
@dragomano
Copy link
Contributor

dragomano commented Jan 24, 2025

When entering the topic, an error occurs:

sshot-8

@live627
Copy link
Contributor

live627 commented Jan 24, 2025

I use cocur/slugify to generate slugs from titles.

@dragomano
Copy link
Contributor

dragomano commented Jan 24, 2025

I use cocur/slugify to generate slugs from titles.

Yes, it's a good library too.

@jdarwood007
Copy link
Member

How is the url consumed into the app? Looks like we just dump it into the query string and then parse it back out later?

Just checking as while we can write .htaccess, I want to understand how we would support other web servers like nginx. If we just dump the request URI into the query string, that can be done easily for other web servers. Just not a oob since those typically require editing a server config, not a local .htaccess

@Sesquipedalian
Copy link
Member Author

Sesquipedalian commented Jan 24, 2025

How is the url consumed into the app? Looks like we just dump it into the query string and then parse it back out later?

Yes, that's exactly what it does. So extending support to other web servers is certainly possible, as you say.

@live627
Copy link
Contributor

live627 commented Jan 26, 2025

I click a link to view a topic and I get error

<br />
<b>Fatal error</b>:  Uncaught Error: Typed static property SMF\User::$me must not be accessed before initialization in C:\wamp64\www\smf-dev\Sources\Topic.php:1632
Stack trace:
#0 C:\wamp64\www\smf-dev\Sources\Topic.php(602): SMF\Topic-&gt;loadTopicInfo()
#1 C:\wamp64\www\smf-dev\Sources\Topic.php(1492): SMF\Topic::load(606)
#2 [internal function]: SMF\Topic::buildRoute(Array)
#3 C:\wamp64\www\smf-dev\Sources\QueryString.php(609): call_user_func('SMF\\Topic::buil...', Array)
#4 C:\wamp64\www\smf-dev\Sources\QueryString.php(536): SMF\QueryString::buildRoute(Array)
#5 [internal function]: SMF\QueryString::SMF\{closure}(Array)
#6 C:\wamp64\www\smf-dev\Sources\QueryString.php(519): preg_replace_callback('~(?:(?:(?&gt;\\b(?:...', Object(Closure), 'http://localhos...')
#7 C:\wamp64\www\smf-dev\Sources\Utils.php(2369): SMF\QueryString::rewriteAsQueryless('http://localhos...')
#8 C:\wamp64\www\smf-dev\Sources\QueryString.php(798): SMF\Utils::redirectexit('http://localhos...')
#9 C:\wamp64\www\smf-dev\Sources\QueryString.php(223): SMF\QueryString::redirectFromMsg()
#10 C:\wamp64\www\smf-dev\Sources\Forum.php(381): SMF\QueryString::cleanRequest()
#11 C:\wamp64\www\smf-dev\index.php(153): SMF\Forum-&gt;__construct()
#12 {main}
  thrown in <b>C:\wamp64\www\smf-dev\Sources\Topic.php</b> on line <b>1632</b><br />

EDIT: This is redirected from the old format.

@Sesquipedalian
Copy link
Member Author

Yeah, I believe that both @dragomano's error and @live627's error were introduced by the redirection code I added for dealing with non-canonical slugs. I'll try to fix it in the next couple of days.

Signed-off-by: Jon Stovell <jonstovell@gmail.com>
@Sesquipedalian Sesquipedalian marked this pull request as draft January 27, 2025 05:36
@Sesquipedalian
Copy link
Member Author

I believe I have fixed those two bugs now.

I am still looking into options for ASCII slugs. I am marking this as a draft until that part has been dealt with.

@dragomano
Copy link
Contributor

The fact that users can link to topics using any phrases results in the canonical address matching those phrases, leading to different links to the same topic appearing in search results:

/topics/some-title-28/
/topics/other-title-28/
/topics/title-28/
/topics/28/

Am I right?

@Sesquipedalian
Copy link
Member Author

No. If the slug is incorrect, the redirection logic will always force a redirect to the correct canonical URL.

Signed-off-by: Jon Stovell <jonstovell@gmail.com>
@tyrsson
Copy link
Collaborator

tyrsson commented Jan 27, 2025

I'm just going to point out the obvious. Do with it what you will.

Actions should not be aware of or take part in their own routing. I can not imagine a way to more tightly couple it than that.

@Sesquipedalian
Copy link
Member Author

Sesquipedalian commented Jan 27, 2025

Having them tightly coupled was an intentional choice. My first draft had a separate route map, but I quickly realized that it would be extremely easy for that to get out of sync with the expected parameters of the actual actions. Instead, my thinking was that each action knows best what its own expected parameters are, and therefore what its route should look like. That's why QueryString::buildRoute() and QueryString::parseRoute() ask the appropriate action how its route should be built/parsed.

But if you have a different way of thinking about it, please do explain. I'm entirely open to revamping this system.

@tyrsson
Copy link
Collaborator

tyrsson commented Jan 28, 2025

Will post back my thoughts on it. I am currently working on something related, for SMF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants