Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

deploy content fails with non-ASCII file names on Windows #256

Closed
fsnow opened this issue Jun 20, 2014 · 8 comments
Closed

deploy content fails with non-ASCII file names on Windows #256

fsnow opened this issue Jun 20, 2014 · 8 comments

Comments

@fsnow
Copy link
Contributor

fsnow commented Jun 20, 2014

With file "EntitéGénérique.xml" in my data directory, I get the following error. Paxton tested on Mac and did not get an error.

ERROR: 500 "Internal Server Error"
ERROR: <error:error xsi:schemaLocation="http://marklogic.com/xdmp/error error.xsd" xmlns:error="http
://marklogic.com/xdmp/error" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <error:code>XDMP-URI</error:code>
  <error:name/>
  <error:xquery-version>1.0-ml</error:xquery-version>
  <error:message>Invalid URI format</error:message>
  <error:format-string>XDMP-URI: Invalid URI format: ""</error:format-string>
  <error:retryable>false</error:retryable>
  <error:expr/>
  <error:data>
    <error:datum>""</error:datum>
  </error:data>
  <error:stack>
    <error:frame>
      <error:xquery-version>1.0-ml</error:xquery-version>
    </error:frame>
  </error:stack>
</error:error>
@grtjn
Copy link
Contributor

grtjn commented Jun 20, 2014

Interestingly, the error message is talking about an empty string. Wondering whether this is a client or a server-side issue. Client-side sounds more likely I guess..

@grtjn
Copy link
Contributor

grtjn commented Aug 27, 2014

I get a different message, but not unlikely same cause. On mac filename is uploaded as:

%2FEntite%CC%81Ge%CC%81ne%CC%81rique.xml

On Windows though it is sent as:

%2FEntit%E9G%E9n%E9rique.xml

Sounds like an encoding issue. @paxtonhare do you know a trick to adjust encoding of a string to UTF-8 before applying uri encoding?

@grtjn
Copy link
Contributor

grtjn commented Jul 7, 2015

path.encode("UTF-8") seems to do the trick..

grtjn added a commit to grtjn/roxy that referenced this issue Jul 7, 2015
@grtjn grtjn modified the milestone: 1.7.3 Jul 7, 2015
@hunterwilliams
Copy link

I just came across the same problem with my US English Windows 10 when attempting to load files with Japanese Kanji in the filename.

Solution:

  1. Type Region in the start menu. If that doesn't work navigate to Control Panel\All Control Panel Items\Language\Advanced settings and then press "Apply language settings for to the welcome screen ...."
  2. Under Language for non-Unicode programs press Change system locale...
  3. Select the language of the filenames causing the ingestion issue. Note this will cause require a restart and afterwards the command prompt may show weird characters instead of the standard \
  4. Attempt to reload content. If you get a 400 error about a blank file then you may have picked the wrong language. If the load fails for another reason try starting command prompt by running cmd /u

@dmcassel
Copy link
Collaborator

dmcassel commented Aug 4, 2015

@hunterwilliams, thanks for providing the info

@grtjn
Copy link
Contributor

grtjn commented Aug 17, 2015

Hunter suggests it might be worthwhile to add these notes in the Wiki somewhere. Maybe create a page dedicated to Windows issues?

dmcassel added a commit that referenced this issue Sep 10, 2015
Fixed #256: applied explicit UTF-8 encoding on file paths for Windows
@grtjn
Copy link
Contributor

grtjn commented Sep 10, 2015

Fixed in dev. Leaving open for the wiki edits..

@grtjn
Copy link
Contributor

grtjn commented Sep 17, 2015

@grtjn grtjn closed this as completed Sep 17, 2015
grtjn added a commit to grtjn/roxy that referenced this issue Jan 28, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants