Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for string concatenation by the indexer #1006

Open
buuhuu opened this issue Oct 3, 2024 · 1 comment · May be fixed by #1007
Open

Support for string concatenation by the indexer #1006

buuhuu opened this issue Oct 3, 2024 · 1 comment · May be fixed by #1007

Comments

@buuhuu
Copy link

buuhuu commented Oct 3, 2024

Is your feature request related to a problem? Please describe.
When customers decide to keep .html extensions for their site when moving to edge delivery, they can currently add the extension to the sitemaps https://www.aem.live/developer/sitemap#adding-an-extension-to-all-locations-in-the-sitemap

It is not possible to add the extensions to the canonical. However, we learned that appending the extension clientside may cause indexing issues as it depends on the crawl budget of a site when and how often it is crawled with javascript executed. In the worst case the canonical is not stable and sometimes considered with and sometimes without the extension.

Also the canonical link is considered a stronger signal for the canonical than the sitemap https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls

One workaround for that is to use an index as additional metadata sheet that sets the canonical metadata for each page appending the extension. This is possible with spreadsheets using formulas, but not with BYOM where the index would only be stored as JSON file.

That may be useful for other use cases as well, where content should be concatenated.

Describe the solution you'd like
Ideally we could use a binary operation in the value expression to concatenate 2 strings

select: main
value: replaceAll(path + '.html', '/.html', '/')

Describe alternatives you've considered
Alternatively we could also support regular expressions using jsep-plugin/regex and do

select: main
value: replaceAll(replaceAll(path, /$/g. '.html'), /\/.html$/g, '/')

Or we support adding extensions to canonicals (and in extend any link) in the html pipeline.

Additional context

https://adobe-dx-support.slack.com/archives/C06FA7MP684/p1727877335760009
https://cq-dev.slack.com/archives/C05QU7MMRNF/p1727126486278739

@buuhuu buuhuu linked a pull request Oct 3, 2024 that will close this issue
2 tasks
@tripodsan
Copy link
Contributor

I don't think that adding this extra functionality solved the problem. the canonical will still be wrong in the metadata.
it would be better to find a way to correct the metadata, eg by introducing a placeholder language: eg:

canonical: {{url}}.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants