You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If export mode should continue to be supported (#230), this is something to consider in order to be able to deliver any meaningful outcome for filename with non-latin (ascii) chars.
Possibly even outside export mode unicode handling would be needed, if URL keys might contain such chars.
Confirmed:
The name field at the end has a format dependent on the backend. It is always the last field, and is prefixed with "--". Unlike other fields, it may contain "-" in its content. It should not contain newline characters or "/"; otherwise nearly anything goes. The "E" variants of hash keys include a filename extension after the hash.
Unicode handling is needed uniformly.
Given that the the mangle/unmangle_path() function pair aims to provide a reversible mapping, and unicode->ascii cannot possibly be that, we need a solution on top.
In principle this should be possible, because we never actually unmangle a path, but only use forward-mangling to match against a state reported by dataverse (code confirms no usage of unmangle_path() outside tests).
The text was updated successfully, but these errors were encountered:
mih
changed the title
Employ Unidecode for export mode path mangling
Employ Unidecode for path mangling
Mar 13, 2023
I think this issue should be fixed by PR #240
PR #240 encodes all characters that are not in the supported dataverse-character set. This is done by an injective mapping. That means there are no collisions in encoded names, i.e. different un-encoded names will be mapped on different encoded names
A side effect of the injectivity of the mapping is that an encoded name could be decoded to yield the original name. As pointed out in #232 (comment), that has currently no application beyond the tests.
Rescuing #83 (comment)
If export mode should continue to be supported (#230), this is something to consider in order to be able to deliver any meaningful outcome for filename with non-latin (ascii) chars.
Possibly even outside export mode unicode handling would be needed, if URL keys might contain such chars.
Confirmed:
Unicode handling is needed uniformly.
Given that the the
mangle/unmangle_path()
function pair aims to provide a reversible mapping, and unicode->ascii cannot possibly be that, we need a solution on top.In principle this should be possible, because we never actually unmangle a path, but only use forward-mangling to match against a state reported by dataverse (code confirms no usage of
unmangle_path()
outside tests).The text was updated successfully, but these errors were encountered: