Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use escape sequences to denote special characters #7644

Merged
merged 34 commits into from
Nov 14, 2022
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
3a21202
[breaking] remove support for encoded directories
benmccann Nov 11, 2022
853901d
readme-driven development
Rich-Harris Nov 11, 2022
98947b9
update
Rich-Harris Nov 11, 2022
1c2fd74
update changeset
Rich-Harris Nov 11, 2022
9e4638a
reinstate and update test
Rich-Harris Nov 11, 2022
6835f30
change errors
Rich-Harris Nov 11, 2022
fb18292
add test
Rich-Harris Nov 11, 2022
4f11591
fix tests
Rich-Harris Nov 11, 2022
1a7a9c8
get some stuff to work
Rich-Harris Nov 11, 2022
bb42526
oops
Rich-Harris Nov 11, 2022
e5c15ca
fix
Rich-Harris Nov 11, 2022
56b7077
add more tests
Rich-Harris Nov 14, 2022
c22a82a
oops
Rich-Harris Nov 14, 2022
34a25c4
update tests to use escape sequences instead of HTML entities
Rich-Harris Nov 14, 2022
5f99253
rename
Rich-Harris Nov 14, 2022
16edfc4
make it work
Rich-Harris Nov 14, 2022
d23bb53
various
Rich-Harris Nov 14, 2022
96b5b98
fix unit test
Rich-Harris Nov 14, 2022
596ac50
update changeset
Rich-Harris Nov 14, 2022
2225cd3
docs
Rich-Harris Nov 14, 2022
ff649fe
fix error
Rich-Harris Nov 14, 2022
4ce0aad
Apply suggestions from code review
Rich-Harris Nov 14, 2022
f757ebd
flesh out docs
Rich-Harris Nov 14, 2022
6effe68
Merge branch 'escape-sequences' of github.com:sveltejs/kit into escap…
Rich-Harris Nov 14, 2022
727d61f
add characters to pages
Rich-Harris Nov 14, 2022
4968767
ffs
Rich-Harris Nov 14, 2022
85eeabe
change should to must
Rich-Harris Nov 14, 2022
460b53f
add % test and docs
Rich-Harris Nov 14, 2022
60ca550
fixes
Rich-Harris Nov 14, 2022
550af8c
Merge branch 'master' into escape-sequences
Rich-Harris Nov 14, 2022
367fed0
fix weird completely unrelated typechecking errors, wtf
Rich-Harris Nov 14, 2022
bc8b26c
pretty sure we can get rid of decode_pathname
Rich-Harris Nov 14, 2022
d3586b5
fucked around, find out. revert revert revert
Rich-Harris Nov 14, 2022
c697df7
Update packages/kit/src/utils/routing.js
Rich-Harris Nov 14, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/smooth-years-speak.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@sveltejs/kit': patch
---

[breaking] use hex/unicode escape sequences for encoding special characters in route directory names
45 changes: 30 additions & 15 deletions documentation/docs/30-advanced/10-advanced-routing.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,27 +123,42 @@ src/routes/[...catchall]/+page.svelte

### Encoding

Directory names are URI-decoded, meaning that (for example) a directory like `%40[username]` would match characters beginning with `@`:
Some characters can't be used on the filesystem — `/` on Linux and Mac, `\ / : * ? " < > |` on Windows. The `#` and `%` characters have special meaning in URLs, and the `[ ] ( )` characters have special meaning to SvelteKit, so these also can't be used directly as part of your route.

To use these characters in your routes, you can use hexadecimal escape sequences, which have the format `[x+nn]` where `nn` is a hexadecimal character code:

- `\` — `[x+5c]`
- `/` — `[x+2f]`
- `:` — `[x+3a]`
- `*` — `[x+2a]`
- `?` — `[x+3f]`
- `"` — `[x+22]`
- `<` — `[x+3c]`
- `>` — `[x+3e]`
- `|` — `[x+7c]`
- `#` — `[x+23]`
- `%` — `[x+25]`
- `[` — `[x+5b]`
- `]` — `[x+5d]`
- `(` — `[x+28]`
- `)` — `[x+29]`

For example, to create a `/smileys/:-)` route, you would create a `src/routes/smileys/[x+3a]-[x+29]/+page.svelte` file.

You can determine the hexadecimal code for a character with JavaScript:

```js
// @filename: ambient.d.ts
declare global {
const assert: {
equal: (a: any, b: any) => boolean;
};
}
':'.charCodeAt(0).toString(16); // '3a', hence '[x+3a]'
```

export {};
You can also use Unicode escape sequences. Generally you won't need to as you can use the unencoded character directly, but if — for some reason — you can't have a filename with an emoji in it, for example, then you can use the escaped characters. In other words, these are equivalent:

// @filename: index.js
// ---cut---
assert.equal(
decodeURIComponent('%40[username]'),
'@[username]'
);
```
src/routes/[u+d83e][u+dd2a]/+page.svelte
src/routes/🤪/+page.svelte
```

To express a `%` character, use `%25`, otherwise the result will be malformed.
The format for a Unicode escape sequence is `[u+nnnn]` where `nnnn` is a valid value between `0000` and `10ffff`. (Unlike JavaScript string escaping, there's no need to use surrogate pairs to represent code points above `ffff`.) To learn more about Unicode encodings, consult [Programming with Unicode](https://unicodebook.readthedocs.io/unicode_encodings.html).
Conduitry marked this conversation as resolved.
Show resolved Hide resolved

### Advanced layouts

Expand Down
37 changes: 36 additions & 1 deletion packages/kit/src/core/sync/create_manifest_data/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -102,14 +102,45 @@ function create_routes_and_nodes(cwd, config, fallback) {
* @param {import('types').RouteData | null} parent
*/
const walk = (depth, id, segment, parent) => {
if (/\]\[/.test(id)) {
const unescaped = id.replace(/\[([ux])\+([^\]]+)\]/gi, (match, type, code) => {
if (match !== match.toLowerCase()) {
throw new Error(`Character escape sequence in ${id} must be lowercase`);
}

if (!/[0-9a-f]+/.test(code)) {
throw new Error(`Invalid character escape sequence in ${id}`);
}

if (type === 'x') {
if (code.length !== 2) {
throw new Error(`Hexadecimal escape sequence in ${id} must be two characters`);
}

return String.fromCharCode(parseInt(code, 16));
} else {
if (code.length < 4 || code.length > 6) {
throw new Error(
`Unicode escape sequence in ${id} must be between four and six characters`
);
}

return String.fromCharCode(parseInt(code, 16));
}
});

if (/\]\[/.test(unescaped)) {
throw new Error(`Invalid route ${id} — parameters must be separated`);
}

if (count_occurrences('[', id) !== count_occurrences(']', id)) {
throw new Error(`Invalid route ${id} — brackets are unbalanced`);
}

if (/#/.test(segment)) {
// Vite will barf on files with # in them
throw new Error(`Route ${id} should be renamed to ${id.replace(/#/g, '[x+23]')}`);
}

if (/\[\.\.\.\w+\]\/\[\[/.test(id)) {
throw new Error(
`Invalid route ${id} — an [[optional]] route segment cannot follow a [...rest] route segment`
Expand Down Expand Up @@ -464,6 +495,10 @@ function normalize_route_id(id) {
// remove groups
.replace(/(?<=^|\/)\(.+?\)(?=$|\/)/g, '')

.replace(/\[[ux]\+([0-9a-f]+)\]/g, (_, x) =>
String.fromCharCode(parseInt(x, 16)).replace(/\//g, '%2f')
)

// replace `[param]` with `<*>`, `[param=x]` with `<x>`, and `[[param]]` with `<?*>`
.replace(
/\[(?:(\[)|(\.\.\.))?.+?(=.+?)?\]\]?/g,
Expand Down
24 changes: 8 additions & 16 deletions packages/kit/src/core/sync/create_manifest_data/index.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -181,32 +181,24 @@ test('succeeds when routes does not exist', () => {
]);
});

// TODO some characters will need to be URL-encoded in the filename
test('encodes invalid characters', () => {
const { nodes, routes } = create('samples/encoding');

// had to remove ? and " because windows

// const quote = 'samples/encoding/".svelte';
const hash = { component: 'samples/encoding/%23/+page.svelte' };
// const question_mark = 'samples/encoding/?.svelte';
const quote = { component: 'samples/encoding/[x+22]/+page.svelte' };
const hash = { component: 'samples/encoding/[x+23]/+page.svelte' };
const question_mark = { component: 'samples/encoding/[x+3f]/+page.svelte' };

assert.equal(nodes.map(simplify_node), [
default_layout,
default_error,
// quote,
hash
// question_mark
quote,
hash,
question_mark
]);

assert.equal(
routes.map((p) => p.pattern),
[
/^\/$/,
// /^\/%22\/?$/,
/^\/%23\/?$/
// /^\/%3F\/?$/
]
routes.map((p) => p.pattern.toString()),
[/^\/$/, /^\/%3[Ff]\/?$/, /^\/%23\/?$/, /^\/"\/?$/].map((pattern) => pattern.toString())
);
});

Expand Down
59 changes: 41 additions & 18 deletions packages/kit/src/utils/routing.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
const param_pattern = /^(\[)?(\.\.\.)?(\w+)(?:=(\w+))?(\])?$/;

/** @param {string} id */
/**
* Creates the regex pattern, extracts parameter names, and generates types for a route
* @param {string} id
Rich-Harris marked this conversation as resolved.
Show resolved Hide resolved
*/
export function parse_route_id(id) {
/** @type {string[]} */
const names = [];
Expand All @@ -21,17 +24,16 @@ export function parse_route_id(id) {
: new RegExp(
`^${get_route_segments(id)
.map((segment, i, segments) => {
const decoded_segment = decodeURIComponent(segment);
// special case — /[...rest]/ could contain zero segments
const rest_match = /^\[\.\.\.(\w+)(?:=(\w+))?\]$/.exec(decoded_segment);
const rest_match = /^\[\.\.\.(\w+)(?:=(\w+))?\]$/.exec(segment);
if (rest_match) {
names.push(rest_match[1]);
types.push(rest_match[2]);
optional.push(false);
return '(?:/(.*))?';
}
// special case — /[[optional]]/ could contain zero segments
const optional_match = /^\[\[(\w+)(?:=(\w+))?\]\]$/.exec(decoded_segment);
const optional_match = /^\[\[(\w+)(?:=(\w+))?\]\]$/.exec(segment);
if (optional_match) {
names.push(optional_match[1]);
types.push(optional_match[2]);
Expand All @@ -41,14 +43,29 @@ export function parse_route_id(id) {

const is_last = i === segments.length - 1;

if (!decoded_segment) {
if (!segment) {
return;
}

const parts = decoded_segment.split(/\[(.+?)\](?!\])/);
const parts = segment.split(/\[(.+?)\](?!\])/);
const result = parts
.map((content, i) => {
if (i % 2) {
if (content.startsWith('x+')) {
return escape(String.fromCharCode(parseInt(content.slice(2), 16)));
}

if (content.startsWith('u+')) {
return escape(
String.fromCharCode(
...content
.slice(2)
.split('-')
.map((code) => parseInt(code, 16))
)
);
}

const match = param_pattern.exec(content);
if (!match) {
throw new Error(
Expand All @@ -69,18 +86,7 @@ export function parse_route_id(id) {

if (is_last && content.includes('.')) add_trailing_slash = false;

return (
content // allow users to specify characters on the file system in an encoded manner
.normalize()
// '#', '/', and '?' can only appear in URL path segments in an encoded manner.
// They will not be touched by decodeURI so need to be encoded here, so
// that we can match against them.
// We skip '/' since you can't create a file with it on any OS
.replace(/#/g, '%23')
.replace(/\?/g, '%3F')
// escape characters that have special meaning in regex
.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
); // TODO handle encoding
return escape(content);
})
.join('');

Expand Down Expand Up @@ -143,3 +149,20 @@ export function exec(match, { names, types, optional }, matchers) {

return params;
}

/** @param {string} str */
function escape(str) {
return (
str
.normalize()
// escape [ and ] before escaping other characters, since they are used in the replacements
.replace(/[[\]]/g, '\\$&')
// replace %, /, ? and # with their encoded versions
Rich-Harris marked this conversation as resolved.
Show resolved Hide resolved
.replace(/%/g, '%25')
benmccann marked this conversation as resolved.
Show resolved Hide resolved
.replace(/\//g, '%2[Ff]')
.replace(/\?/g, '%3[Ff]')
.replace(/#/g, '%23')
// escape characters that have special meaning in regex
.replace(/[.*+?^${}()|\\]/g, '\\$&')
);
}
14 changes: 2 additions & 12 deletions packages/kit/src/utils/routing.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -58,20 +58,10 @@ const tests = {
names: ['id'],
types: ['uuid']
},
'/%23hash-encoded': {
pattern: /^\/%23hash-encoded\/?$/,
names: [],
types: []
},
'/%40at-encoded/[id]': {
pattern: /^\/@at-encoded\/([^/]+?)\/?$/,
'/@-symbol/[id]': {
pattern: /^\/@-symbol\/([^/]+?)\/?$/,
names: ['id'],
types: [undefined]
},
'/%255bdoubly-encoded': {
pattern: /^\/%5bdoubly-encoded\/?$/,
names: [],
types: []
}
};

Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
<a href="/encoded/反应">反应</a>
<a href="/encoded/redirect">Redirect</a>
<a href="/encoded/@svelte">@svelte</a>
<a href="/encoded/$SVLT">$SVLT</a>
<a href="/encoded/test%2520me">test%20me</a>
<a href="/encoded/test%252fme">test%2fme</a>
<a href="/encoded/AC%2fDC">AC/DC</a>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<script>
import { page } from '$app/stores';
</script>

<h1>{decodeURIComponent($page.url.pathname.split('/').pop())}</h1>
<slot />

<a href="/encoded/escape-sequences/:-)">:-)</a>
<a href="/encoded/escape-sequences/%23">#</a>
<a href="/encoded/escape-sequences/%2F">/</a>
<a href="/encoded/escape-sequences/%3f">?</a>
<a href="/encoded/escape-sequences/苗">苗</a>
<a href="/encoded/escape-sequences/<">&lt;</a>
<a href="/encoded/escape-sequences/1<2">1&lt;2</a>
<a href="/encoded/escape-sequences/🤪">🤪</a>
<a href="/encoded/escape-sequences/%25">%</a>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
🤪
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
#
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
%
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
:-)
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
&lt;
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
?
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1&lt;2
39 changes: 32 additions & 7 deletions packages/kit/test/apps/basics/test/test.js
Original file line number Diff line number Diff line change
Expand Up @@ -323,17 +323,42 @@ test.describe('Encoded paths', () => {
});
});

test('allows %-encoded characters in directory names', async ({ page, clicknav }) => {
await page.goto('/encoded');
await clicknav('[href="/encoded/$SVLT"]');
expect(await page.textContent('h1')).toBe('$SVLT');
});

test('allows %-encoded characters in filenames', async ({ page, clicknav }) => {
test('allows non-ASCII character in parameterized route segment', async ({ page, clicknav }) => {
await page.goto('/encoded');
await clicknav('[href="/encoded/@svelte"]');
expect(await page.textContent('h1')).toBe('@svelte');
});

test('allows characters to be represented as escape sequences', async ({ page, clicknav }) => {
await page.goto('/encoded/escape-sequences');

await clicknav('[href="/encoded/escape-sequences/:-)"]');
expect(await page.textContent('h1')).toBe(':-)');

await clicknav('[href="/encoded/escape-sequences/%23"]');
expect(await page.textContent('h1')).toBe('#');

await clicknav('[href="/encoded/escape-sequences/%2F"]');
expect(await page.textContent('h1')).toBe('/');

await clicknav('[href="/encoded/escape-sequences/%3f"]');
expect(await page.textContent('h1')).toBe('?');

await clicknav('[href="/encoded/escape-sequences/%25"]');
expect(await page.textContent('h1')).toBe('%');

await clicknav('[href="/encoded/escape-sequences/<"]');
expect(await page.textContent('h1')).toBe('<');

await clicknav('[href="/encoded/escape-sequences/1<2"]');
expect(await page.textContent('h1')).toBe('1<2');

await clicknav('[href="/encoded/escape-sequences/苗"]');
expect(await page.textContent('h1')).toBe('苗');

await clicknav('[href="/encoded/escape-sequences/🤪"]');
expect(await page.textContent('h1')).toBe('🤪');
});
});

test.describe('$env', () => {
Expand Down