URL Scheme Proposal #3520

ssddanbrown · 2022-06-22T14:22:35Z

History & Purpose

Currently BookStack uses a fixed system for URLs of content within the system which uses "slugs" generated from the name of content. Upon this there's also a fairly hidden id-based system for pages. Examples of existing content urls:

/books/my-awesome-book/pages/my-cool-page
/link/102 (Page permalink)
/books/my-awesome-book/chapter/my-brilliant-chapter
/books/my-awesome-book
/shelves/my-terriffic-shelf

The original idea behind using slugs is provide the user an indication of the likely destination content from the URL alone. A user observing the uri /books/frogs would instantly know the link will likely lead to a book about frogs.

Over the years since building BookStack a number of cases have arisen which has indicated this scheme is not ideal in some scenarios. This proposal/discussion puts forward a new scheme, to address these scenarios, to use as default going forward.

This is an open discussion to gain feedback, details below are not at all final. Comments are very much welcomed. This is not assured to go ahead, especially if the impact looks to be greater than actual long-term benefits.

Targets

Achieve more "Permanent" URLs by default

Right now URLs use slugs which are generated based upon the name of an item.
Changes to name can cause changes to the URL which can break URL references. We do have a system to help handle these scenarios, by referring to the revisions system, but this does not cover every change and revisions may be pruned.

We do have the /link/<page_id> permalinks but these use the database incrementing integers which can leak detail regarding system content (Gap in ids may indicate hidden pages).

There are additional ways we could improve the current URL handling on changes but I think it may be better to use a more reliable base system than apply patches.

Allow flexibility of the content URL

While the existing URLs provide good indication of content, this breaks down when other languages are used. The URL path sections, between slugs, is hard-coded to English which may not be understandable to the reader. In addition, we attempt some conversion to latin for some slugs which can completely remove the original content name for some languages.

We've also seen both requests to have a minimal length URL, and a longer, more descriptive, URL.

Proposed Scheme

The proposed new URL scheme is that shown below. This reflects what would be the new default for a "Page" item URL. The default configuration is intended to closely align with the appearance of the existing default URLs. The components of this scheme are broken down in sections below.

/p-4b72a/:/books/my-awesome-book/pages/my-cool-page
|-------|-|----------------------------------------|
   ^     ^                    ^
  UID  Separator        Configurable Trail

Examples for existing item types

/p-4b72a/:/books/my-awesome-book/pages/my-cool-page
/c-e29bf/:/books/my-awesome-book/chapter/my-brilliant-chapter
/b-7itu3/:/books/my-awesome-book
/s-45da1/:/shelves/my-terriffic-shelf

UID

This acts as a unique identifier for an item within BookStack. It will be a flexible-length case-insensitive (defaulting to lower case) alpha-numeric string defaulting to minimum 5 characters. This is used instead of a UUID-like ID to keep this short and usable in URLs with little impact. It's case-insensitive for compatibility with case-insensitive systems.

The UID is prefixed with an type letter followed by a hyphen. The type letter represents the content type (p = page) which allows type identification from the UID alone while ensuring UIDs are unique across content types. The hyphen separates the type indicator while allowing a pattern to match upon to prevent confusion with other systems URL at the same path prefix.

Separator

The separator simply exists to separate the UID from the configurable trail portion of the URL.
This allows us to still use the core ID-based URL for system endpoints (For example /b-7itu3/create-page) while having clear separation to the configurable trail to prevent conflicts.

Configurable Trail

This portion of the URL would be system-admin configurable as required. It would default to match the current BookStack slug scheme, but we would provide an interface to allow per-content-type (book/page/chapter/shelf) configuration of this trail using static and dynamic-placeholder elements.

The available dynamic-placeholder elements would initially be as follows:

{{name}} - URL encoded version of the item name.
{{slug}} - Name-generated slug, as we provide now.
{{book_slug}} - - Name-generated slug, as we provide now. Chapters and pages only.

This could be expanded on in the future, but the initial implementation goal would be to match existing options.

The trail could be made empty if desired, which would cleanly generate item URLs with no trail and no separator components.

Considerations

Some of these changes are based upon the feedback received via community channels, but the actual overall desire for such changes, across the whole existing BookStack user-base, is tremendously hard to judge.
Non-content URLs will remain the same. For example the /books endpoint would remain as-is, which may limit the flexibility benefits in scenarios where users want to get away from the default BookStack terms, although this isn't a core goal here.
This won't be an instant switch-over to new URLs. It's important we make any changes in a stable manner. We'd look to support existing URL schemes in parallel for a significant time afterwards.
The UIDs would effectively be replacing current integer-based IDs in purpose. We'll need to think how we roll these out for the API, where ID's are used heavily. We could use both in parallel for some time but there are complications in regards to how relations are shown.

Personal Thoughts

I'm not fully sure on the propose scheme. The separator especially makes it look a little ugly, but it's the closest I could think when thinking about the technical handling and attempting to have a default aligning with the current URL scheme.
After writing this out, It's very hard to assess if such a change would be worth it. The benefits right now are very edge-case based but these may amplify long-term.

The text was updated successfully, but these errors were encountered:

Szwendacz99 · 2022-06-25T11:26:36Z

Do I understand correctly that with the new system when someone changes name of book, chapter etc, BookStack will be still able to retreive (or redirect?) to the proper address with the new trail, thanks to the unchangable ID? If so then it seems reasonable, same as replacing integer ID to more unpredictable one. But then I wonder how will be such ID generated, because It seems to be kidna too short for true pseudo randomization, unless it wil perform additional check if it is untaken.

The separator visual value is hard to asses for me, but such binary question (separator, Yay or nay?) could be investigated with some simple survey., Or even make survey with multiple possible separators

andysh-uk · 2022-07-08T10:23:43Z

I personally don't see an issue with the current URL scheme itself - I actually use something very similar myself in an open-source project.

The first URL component lists the item type, followed by the path to the item - for example:

/album/my-parent-album/my-child-album = an album called "My Child Album" within a parent album called "My Parent Album."
/photo/my-parent-album/my-child-album/DSCF0001.jpeg = a photo with the filename DSCF0001.jpeg in an album called "My Child Album" within a parent album called "My Parent Album."

My biggest pet peeve with the system in Bookstack is that a change to a page name or book name, changes the URLs used to access it. In a system designed for documentation, where links directly to content are pretty much expected, this is a big issue.

My suggestions would be to either:

a) keep the existing scheme but generate a separate "permalink" based on an identifier (not the DB ID.)

When a page/chapter/shelf is created, generate a pseudo-random token of suitable length (e.g. ta19en2uan) and allow users to link to this permalink - e.g. /permalink/ta19en2uan - but the rest of Bookstack would still use the current scheme.

The link would be a flat structure so you don't have to worry about nested pages/chapters etc.

When a browser visits a permalink, Bookstack could retrieve the item based on the identifier (you could maybe include the item type in the link so you know what type it is - e.g. /permalink/book/ta19en2uan) and then redirects to whatever the current (non-permanent) URL of the item is, so search engines don't see it as duplicate content.

This should be a temporary redirect so browsers do not cache the non-permanent link in place of the permanent link.

E.g. 302 redirect /permalink/book/ta19en2uan --> /book/my-cool-book

b) Alternatively, keep the existing scheme but whenever a page title changes, store a history of the URLs it has been accessible under.

If accessing a URL would return a 404, check the history table to see if it was previously used and redirect to the current URL of the item it was used for.

E.g.

Create "My Cool Book" - results in a URL of /book/my-cool-book
Change title to "My Really Cool Book" - results in a "current" URL of /book/my-really-cool-book
Access /book/my-cool-book - would currently result in a 404. Proposed: sees that "My Really Cool Book" previously had a URL of /book/my-cool-book so redirect the browser to the current URL of /book/my-really-cool-book.

tcatlas · 2022-07-11T17:47:52Z

Personally, I really like the proposed change. Sure, the URLs may be slightly ugly but it solves the very real problem of breaking links if and when the name of a document is updated. It's a URL, not a work of art. It doesn't have to be beautiful - I mean have you seen SharePoint URLs? I think /p-4b72a/:/books/my-awesome-book/pages/my-cool-page is a good balance of form and function - though perhaps the ID could be placed at the end of the URL instead of the beginning since the end is more likely to be hidden in an overflow scenario.

andysh-uk · 2022-07-11T21:17:49Z

it solves the very real problem of breaking links if and when the name of a document is updated

Does it though? What happens to the URL “/p-4b72a/:/books/my-awesome-book/pages/my-cool-page” if you rename “My Cool Page” to “My Other Cool Page”? Yes the unique ID is still the same, but the page name is still part of the URL, so would that change to “/p-4b72a/:/books/my-awesome-book/pages/my-other-cool-page”, in which case you’ve got the same problem - the previous URL is now broken.

Or are we saying that Bookstack would only use the UID part of the URL to find the matching content? In which case, the URL “/p-4b72a/:/books/my-awesome-book/pages/my-cool-page” and “/p-4b72a/:/something-completely-random” would arrive at the same piece of content? In which case, what is the purpose/benefit of the trail?

EDIT: just seen Stack Overflow does exactly this. “/questions/1234/whatever” will get you to question ID 1234, whatever you put in place of “whatever”. However it does redirect to the correct URL of “/questions/1234/the-question-title” to avoid duplicate content issues arising with search engines.

I mean have you seen SharePoint URLs?

I sure have! They’re horrendous.

amelszg · 2022-07-12T11:04:02Z

Hi, i like the proposed schema as well and would prefer a page/chapter lookup via UID.

In my use case i am consuming Bookstack data exclusively via API and handling the navigation between pages myself.

That means i'm parsing the target urls from content and rewriting it with pageid. This only works as long as page is not renamed. As soon as that happens, the link gets broken and since revision data is not available via API i have no means to lookup/fix those links automatically.

For this reason i would prefer the proposed url schema and using UID per default in page content when cross linking to another page.

Not sure why the API interface needs to change though, keeping Integer based Id for all operations would be still fine for me, i.E. api/pages/{id}, as long as the new UID is returned as result with page object.

And yes, the separator feels a little bit ugly in the url, maybe it could be defined differently or removed alltogether.

Just some suggestions:
/p-4b72a/books/my-awesome-book/pages/my-cool-page
/p/4b72a/books/my-awesome-book/pages/my-cool-page

To be honest, even keeping the int based Id would be fine, the main issue for me is preventing having broken links after page being renamed, thus some kind of fixed, convention-based and parsable identifier in url is needed and for that case even current int Id's would be allright, imho.
So this example, using current {page_id} would be fine as well:
/p/15/books/my-awesome-book/pages/my-cool-page

Thanks.

ssddanbrown · 2022-07-12T11:37:00Z

Thanks all for your input so far. Some feedback on the responses:

@Szwendacz99 Do I understand correctly that with the new system when someone changes name of book, chapter etc, BookStack will be still able to retreive (or redirect?) to the proper address with the new trail, thanks to the unchangable ID? I

Yes, that is correct.

@Szwendacz99 But then I wonder how will be such ID generated, because It seems to be kidna too short for true pseudo randomization, unless it wil perform additional check if it is untaken.

Yeah, We'd scan the DB to ensure uniqueness. We already do this with content slugs.

@andysh-uk My suggestions would be to either ... keep the existing scheme but generate a separate "permalink" based on an identifier (not the DB ID.)

We already have these available (albeit rather unused) for pages. The main thing I don't want is non-alignment between a permalink system and actual browser/resulting URLs. If a user can't have the same benefits from copying the URL from the browser URL bar I see that as a fail, and hence why I've been hesitant to expand the current ID-based system.

@andysh-uk Alternatively, keep the existing scheme but whenever a page title changes, store a history of the URLs it has been accessible under.

Yeah, this is another approach on the table, but I wanted to explore the changes to the URL system first to see if a wider set of issues could be solved, upon just the linking aspect.

@DiscoveryOV though perhaps the ID could be placed at the end of the URL instead of the beginning since the end is more likely to be hidden in an overflow scenario.

End placement gets more complex to parse out, especially where we're allowing a lot of customizability in the trail part. Not impossible, but probably not worth the move to the end for the complexity it could add upon having stable base url patterns.

@andysh-uk Or are we saying that Bookstack would only use the UID part of the URL to find the matching content? ... In which case, what is the purpose/benefit of the trail?

Yes, BookStack would only use the UID part to identify the content. As per the existing slugs in content URLs, it allows context to be provided in the URL alone. This is mentioned in the "History & Purpose" section of the proposal. As reflected in the proposal, you could configure the trail empty to have clean UID-only urls if desired.

@amelszg To be honest, even keeping the int based Id would be fine

The change away from the current integer ids would be to help avoid current potential security considerations. Jumps in id, or lack of access to certain id, can indicate existence of hidden content. Not an issue in many environments but will be a consideration in some. In addition, these new ids could set-up for future migration to having the different content types in shared table-space in the database (Longer term thinking though).

Just to confirm though, if it helps, you can still currently use /link/<page_id> URLs if you need a page permalink at this time.

Personal Thoughts - Updated

I'm still unsure about this overall, probably now less convinced than before. I think this may be mass-optimizing for too many problems that hardly actually are problematic in reality. Additional targeted addressing of URL changes would likely solve 90% of actual problems this whole proposal addresses, without causing a painful migration.

amelszg · 2022-07-12T11:58:08Z

Hi @ssddanbrown, thanks for the reply.

Just to confirm though, if it helps, you can still currently use /link/<page_id> URLs if you need a page permalink at this time.

The problem is, that i don't have any control over the content. The editors/maintainers will simply use what's most convenient for them, meaning they will either use the built in URL picker element which doesn't produce the mentioned "permalink" or they just copy the url from adress bar, they won't bother selecting some text snippets to create permalinks.

If this would be configurable (i.E. URL Picker would create permalinks instead of slug base urls), that would mitigate my current problem, although there would be still a small issue with manually pasted urls.

c0shea · 2022-07-13T13:42:01Z

I'm not a fan of the proposed change. The only issue we've faced with the current URLs is that if you move a page, the pretty URL changes. However, we mitigated that by using the page's permalink by hovering over the page title and clicking the pop-over copy button to get the permalink.

A sizeable portion of our documentation in BookStack is linked to from an external system to specific pages via their permalink. Changing the unique ID from an integer as it is today to a different value would be a breaking change that would be a monumental effort to adjust all of our links or risk them not working at some undetermined time in the future.

I tend to think that using a UID instead of the plain old database incrementing IDs is security through obscurity and not worth the added complexity here. Yes, it's possible for someone to deduce that if the page ID in their URL is 123 that they could try to access page 122, but the real effort would be ensuring the permission system is working appropriately to deny them access if they shouldn't be able to see that page in the first place. Our instance of BookStack is for internal-only access for our documentation. What could someone possibly have to gain from knowing that we have thousands of pages based on the ID? To me, it's not that useful to know. I'd be curious to know how many people are actually using BookStack in a truly public manner (not just internal company documentation as we do) and how much obscurity would really benefit them.

milneauk · 2022-08-04T12:12:25Z

Would it be possible to review the permalink visibility when addressing this? Users currently have to highlight a portion of the page content to get the UID/permalink but I think there should be an option to display this on the page automatically from the settings. We find that users typically copy the URL from the browser's address bar but this breaks when pages are moved or renamed.

c0shea · 2022-08-05T01:52:54Z

@cbbaaron I agree. Even if it was another button with the other actions on the right side of the page that when clicked copied the permalink to the clipboard that would be helpful.

ghaberek · 2022-08-09T17:35:39Z

I think overall the URL scheme should be improved but the current proposed scheme feels awkward to me. You could use something like Hashids to generate reversible unique IDs from the database ID for each entity type, which alleviate having to run a large migration across the whole database. Then use each URL prefix (or just 'p', etc.) to encode the entity IDs uniquely by type, which helps ensure URL prefixes can't be easily swapped. You could take this a step further and combine the APP_KEY with the prefix to make the IDs unique to the system to help prevent leaking of IDs (unless that could expose the site's app key, I haven't dug into Hashids too deeply).

use Hashids\Hashids;

$hashids = new Hashids('', 10);
echo $hashids->encode(1); // 1 => 'VolejRejNm'

$shelfHash = new Hashids('s', 10);
echo $shelfHash->encode(1); // 1 => 'x9JgqK6emq'

$bookHash = new Hashids('b', 10);
echo $bookHash->encode(1); // 1 => 'WM6epYXdKv'

$pageHash = new Hashids('p', 10);
echo $pageHash->encode(1); // 1 => 'xm0kebQ7Vq'

Another suggestion I propose is reducing the URL prefixes for entities from plural to singular, while leaving the plural names (e.g. /books) for all entites of that type as well as for supporting the old URL scheme until it's retired. So you would browse to /books to see all books but those links would point you to /book/<uid>/<book-title>.

Entity	Current	Permalink	Clean URL
Shelves	`/shelves/<slug>`	`/s/<uid>`	`/shelf/<uid>/<ignored-title-slug>`
Books	`/books/<slug>`	`/b/<uid>`	`/book/<uid>/<ignored-title-slug>`
Pages	`/books/<slug>/pages/<slug>`	`/p/<uid>`	`/page/<uid>/<ignored-title-slug>`

ssddanbrown · 2022-09-19T22:46:24Z

As of v22.09 the main pain-point, that this proposal would have addressed (breaking of internal cross-links), should now be much less of an issue. Therefore I think it'd be especially not worthwhile now to move ahead with this proposal, or a variation of it, as I don't think it'd be addressed enough of a fundamental need for the cost and confusion it would require.

Thanks everyone for your input, it has been very valuable to guide my thoughts and understand the actual needs & desires at play here.

ssddanbrown pinned this issue Jun 22, 2022

ssddanbrown added the ☕ Open to discussion label Jun 22, 2022

ssddanbrown mentioned this issue Jul 25, 2022

Chapter links are not preseved #3582

Closed

milneauk mentioned this issue Aug 11, 2022

Page permalink visibility #3641

Closed

1 task

ssddanbrown mentioned this issue Sep 5, 2022

[Feature request]Automatic update of cross-linking between pages/chapters/books #1969

Closed

ssddanbrown closed this as completed Sep 19, 2022

ssddanbrown unpinned this issue Sep 19, 2022

ssddanbrown mentioned this issue Jan 26, 2023

Provide a way for blind users to retrieve a page permalink #3975

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URL Scheme Proposal #3520

URL Scheme Proposal #3520

ssddanbrown commented Jun 22, 2022

Szwendacz99 commented Jun 25, 2022

andysh-uk commented Jul 8, 2022

tcatlas commented Jul 11, 2022

andysh-uk commented Jul 11, 2022

amelszg commented Jul 12, 2022

ssddanbrown commented Jul 12, 2022 •

edited

Loading

amelszg commented Jul 12, 2022

c0shea commented Jul 13, 2022

milneauk commented Aug 4, 2022

c0shea commented Aug 5, 2022

ghaberek commented Aug 9, 2022

ssddanbrown commented Sep 19, 2022

URL Scheme Proposal #3520

URL Scheme Proposal #3520

Comments

ssddanbrown commented Jun 22, 2022

History & Purpose

Targets

Achieve more "Permanent" URLs by default

Allow flexibility of the content URL

Proposed Scheme

Examples for existing item types

UID

Separator

Configurable Trail

Considerations

Personal Thoughts

Szwendacz99 commented Jun 25, 2022

andysh-uk commented Jul 8, 2022

tcatlas commented Jul 11, 2022

andysh-uk commented Jul 11, 2022

amelszg commented Jul 12, 2022

ssddanbrown commented Jul 12, 2022 • edited Loading

Personal Thoughts - Updated

amelszg commented Jul 12, 2022

c0shea commented Jul 13, 2022

milneauk commented Aug 4, 2022

c0shea commented Aug 5, 2022

ghaberek commented Aug 9, 2022

ssddanbrown commented Sep 19, 2022

ssddanbrown commented Jul 12, 2022 •

edited

Loading