Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing path-segment sanitization rules #4092

Closed
wants to merge 1 commit into from
Closed

Implementing path-segment sanitization rules #4092

wants to merge 1 commit into from

Conversation

tsuereth
Copy link

Implementing path-segment sanitization rules, for things like taxonomy terms and titles in permalinks, so that slashes and pound signs get replaced (with hyphens) rather than considered as part of a filesystem or URL path

Includes a new template function urlizeSegment (and corresponding documentation) which can be used in content to emulate said path-segment sanitization

Fixes #4090

…y terms and titles in permalinks, so that slashes and pound signs get replaced (with hyphens) rather than considered as part of a filesystem or URL path

Includes a new template function urlizeSegment (and corresponding documentation) which can be used in content to emulate said path-segment sanitization

Fixes #4090
@bep
Copy link
Member

bep commented Nov 21, 2017

I understand the problem, but we cannot add yet another special thing with another set of special methods to handle this.

Either we add this to the existing Urlize func, or we don't. I'm not sure. Will have to think about it.

/cc @moorereason

@moorereason
Copy link
Contributor

I agree with @bep.

@tsuereth
Copy link
Author

Thanks @bep and @moorereason for your attention on this!

I agree with the sentiment of reducing special-case code paths -- that being said, the complication I ran into when beginning this work is that URL/path sanitization is applied somewhat inconsistently throughout Hugo's codebase.

In some places, path segments (which should not contain a slash) are sanitized before joining them together. But in other places, the entire path (which must contain slashes) is sanitized at once.

(This is not to mention the hypothetical backwards-incompatibility for templates that might currently use urlize and expect slashes to come back out of it.)

Phrasing the functions MakePathSegmentSanitized and URLizeSegment as I did was intended to start separating these concerns. If you believe the goal should be to make the formation of path segments into full paths (or URLs) follow more uniform rules, then there is substantially more code auditing to do. And I suppose that some kind of new interface for templates would be required, to avoid breaking existing content.

Would you like help with this? Per my examples, I'm somewhat invested in the cause.

@moorereason
Copy link
Contributor

You’re correct, which is why I said I agreed with @bep’s “will have to think about it” comment.

My recommendation at this point is to remove the template func stuff and just focus on fixing the internal path sanitizer issues for taxonomies and titles. Those are clear bugs.

How important is the template func for your use case?

@tsuereth
Copy link
Author

tsuereth commented Nov 23, 2017

I use the template function to implement shortcodes for linking to a specific term's page. I.e. in a post:

Text text text, {{% game "SHENZHEN I/O" %}}link to the game SHENZHEN I/O{{% /game %}} because more text ...

Then the shortcode, implemented in layouts/shortcodes/game.html:

<a href="{{ .Page.Site.BaseURL }}game/{{ index .Params 0 | urlizeSegment }}">{{ .Inner }}</a>

Is there a better way to generate a permalink to a specific term's page? If there is, then I wouldn't have any need for urlizeSegment.

@moorereason
Copy link
Contributor

I prefer to use “ref” and “relref” for page links.

@tsuereth
Copy link
Author

The ref/relref template functions don't seem to fit my use-case, given that they need a pre-encoded document path. If I were to hard-code my content with document paths for taxonomy terms, then all of my content would have to be revised by hand if the encoding rules for terms change. (For example, if the rule for slashes is changed, as in this pull request!)

This would get doubly frustrating for term hyperlinks that are generated dynamically in a layout template, such as when my layout for a post links to the term page for each game tagged in that post. The template needs both the "pretty" (for display) and sanitized (for hyperlink) forms of each term, so if there isn't a reliable way to translate one to the other, then content would need to provide both - perhaps in two separate lists? - to the template.

To be clearer, what I'm looking for in a template function is a way to reference or replicate the process by which Hugo determines where a taxonomy term "lives." Then my content doesn't have to know any of the rules of this string-to-filepath translation process.

What urlizeSegment proposes in this PR - and how I'm currently using urlize - is to "replicate" that translation process. A tactical alternative might be to "reference" it instead, i.e. if my content templates could search a map of friendly term names to their sanitized paths. (Digging into this a bit, however, it would seem to lead to the problems noted in #2835 .)

@bep
Copy link
Member

bep commented Nov 24, 2017

Is there a better way to generate a permalink to a specific term's page?

{{ .Site.GetPage "taxonomy" "game" "myterm" }}

@bep
Copy link
Member

bep commented Nov 24, 2017

That said. Let's move the discussion to #4090. I'm closing this, as this is not the proper solution for this.

@bep bep closed this Nov 24, 2017
@tsuereth
Copy link
Author

Aha! This had escaped my notice -- using GetPage will solve my content template problem.

For demonstration, here's my revised shortcode template:

{{ $term := $.Site.GetPage "taxonomyTerm" "game" (index .Params 0) }}
{{ if $term }}
<a href="{{ $term.Permalink }}">{{ .Inner }}</a>
{{ else }}
{{ .Inner }}
{{ end }}

This does change one behavior, compared to the urlize implementation -- if I use the shortcode on a term that doesn't exist, then GetPage will return nil for it, and I can't formulate a hyperlink (whereas urlize would make a hyperlink to a page that doesn't exist => 404 error).

At least in my case, this behavior isn't a problem.

Thanks @bep !

@bep
Copy link
Member

bep commented Nov 24, 2017

and I can't formulate a hyperlink

I think you can do something like:

{{ $term := $.Site.GetPage "taxonomyTerm" "game" (index .Params 0) }}
{{ $url := cond (ne $term nil) $term.RelPermalink "http://notfound/" }}

But the bottom line here I think is this: The old urlize constructs comes from the time where Hugo did not have a complete object graph -- so taxonomies was just files written to disk. We should fix the documentation, and if there still are examples that isn't possible to do without manual URL concatenation in the templates, we should fix that.

moorereason added a commit to moorereason/hugo that referenced this pull request Feb 8, 2018
This commit attempts to sanitize term keys and path segments to fix the
handling of taxonomy terms containing slashes.

Based upon gohugoio#4092 by @tsuereth.

Fixes gohugoio#4090
moorereason added a commit to moorereason/hugo that referenced this pull request Aug 21, 2018
This commit attempts to sanitize term keys and path segments to fix the
handling of taxonomy terms containing slashes.

Based upon gohugoio#4092 by @tsuereth.

Fixes gohugoio#4090
@github-actions
Copy link

github-actions bot commented Feb 8, 2022

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slashes in a taxonomy term break path resolution
3 participants