Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filenames are urlencoded when import from Notion #303

Open
and7ey opened this issue Oct 2, 2024 · 8 comments
Open

Filenames are urlencoded when import from Notion #303

and7ey opened this issue Oct 2, 2024 · 8 comments
Labels

Comments

@and7ey
Copy link

and7ey commented Oct 2, 2024

Here is an example how some file names are imported:

image

Here is an example of full filename:
%D0%A0%D1%83%D0%BA%D0%BE%D0%B2%D0%BE%D0%B4%D1%81%D1%82%D0%B2%D0%BE_%D0%BF%D0%BE_%D0%BF%D0%BE%D0%B4%D0%BA%D0%BB%D1%8E%D1%87%D0%B5%D0%BD%D0%B8%D1%8E_DK103M(W)(-4G)

@tgrosinger
Copy link
Contributor

Would it be possible for you to provide us with a Notion export zip file that we could use to help test and resolve this issue? Thank you!

@and7ey
Copy link
Author

and7ey commented Oct 10, 2024

@tgrosinger I can not share the export zip since it contains confidential info.
But I can give you more details.

Here is the screenshot from Notion exported html:
image
the file is referenced somewhere here I think:
<figure id="8d1a494d-0a00-4958-b879-876fe051cf94"><div class="source"><a href="%D0%94%D0%BE%D0%BC%D0%BE%D1%84%D0%BE%D0%BD%20Beward%2016aaf973752a4cc89313e48964bc76a0/%25D0%25A0%25D1%2583%25D0%25BA%25D0%25BE%25D0%25B2%25D0%25BE%25D0%25B4%25D1%2581%25D1%2582%25D0%25B2%25D0%25BE_%25D0%25BF%25D0%25BE_%25D0%25BF%25D0%25BE%25D0%25B4%25D0%25BA%25D0%25BB%25D1%258E%25D1%2587%25D0%25B5%25D0%25BD%25D0%25B8%25D1%258E_DK103M(W)(-4G).pdf">Руководство по подключению DK103M(W)(-4G).pdf</a></div></figure>

The filename is actually URL encoded:
image

@felciabatta
Copy link
Contributor

Hi @and7ey, was Руководство по подключению DK103M(W)(-4G).pdf the original filename of the PDF stored in your Notion workspace?

If so, then this means Notion is responsible for encoding the filename on export, and perhaps we can decode this with Importer...

@and7ey
Copy link
Author

and7ey commented Dec 11, 2024

The decoded filename looks Ok

@felciabatta
Copy link
Contributor

I just tested this in my Notion workspace and can confirm the same thing happens when exporting.

Will have a look in decoding back into the original characters — the challenge will be updating this across the Vault for all references to the file.

@felciabatta
Copy link
Contributor

The filename is actually URL encoded:

It's actually even worse than that; this is the component of the href referring to the filename itself (not the full path):

%25D0%25A0%25D1%2583%25D0%25BA%25D0%25BE%25D0%25B2%25D0%25BE%25D0%25B4%25D1%2581%25D1%2582%25D0%25B2%25D0%25BE_%25D0%25BF%25D0%25BE_%25D0%25BF%25D0%25BE%25D0%25B4%25D0%25BA%25D0%25BB%25D1%258E%25D1%2587%25D0%25B5%25D0%25BD%25D0%25B8%25D1%258E_DK103M(W)(-4G).pdf

and compare this to the URL-encoded filename:

%D0%A0%D1%83%D0%BA%D0%BE%D0%B2%D0%BE%D0%B4%D1%81%D1%82%D0%B2%D0%BE_%D0%BF%D0%BE_%D0%BF%D0%BE%D0%B4%D0%BA%D0%BB%D1%8E%D1%87%D0%B5%D0%BD%D0%B8%D1%8E_DK103M(W)(-4G).pdf

They're not the same — the href is a double-encoded version of the original filename:

Руководство по подключению DK103M(W)(-4G).pdfbut not quite — when you decode the href you get:

Руководство_по_подключению_DK103M(W)(-4G).pdf — so it also replaces spaces with underscores!

This leads me to believe this is actually a bug with Notion's export utility: the encoding should only be required for the href, not the actual file itself — it makes no sense to double-encode like this. 😭

@and7ey
Copy link
Author

and7ey commented Dec 11, 2024

What if we follow the MS Excel approach when importing CSV file - it shows the file content, suggests the encoding and asks if user would like to change it. What if we do the same? Single encoding, double encoding?

@felciabatta
Copy link
Contributor

I think it's probably simpler to default to decoding the filenames — we know how Notion encodes them so we can just apply the same decoder across the board.

For filenames that aren't encoded in the first place, applying the URL decoder should leave them unmodified anyway — although in theory someone could intentionally have %XX strings in their filename...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants