Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] URL Encoding to support Blob names with special characters #4324

Closed
alzimmermsft opened this issue Jul 8, 2019 · 11 comments
Closed

[bug] URL Encoding to support Blob names with special characters #4324

alzimmermsft opened this issue Jul 8, 2019 · 11 comments
Assignees
Labels
blocking-release Blocks release Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files)

Comments

@alzimmermsft
Copy link
Member

alzimmermsft commented Jul 8, 2019

Blobs are allowed to have special characters in their name which will need to be encoded when they are used as part of the service URL.

Questions stand on where the encoding should happen, there are a few places where a blob name is set

  • Setting the endpoint in the builder (this would likely be a URL)
  • Setting the blob name in the builder (this could be from a URL)
  • Getting a blob client from the container client (this could be from a URL)
@alzimmermsft alzimmermsft added Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files) and removed triage labels Jul 9, 2019
@rickle-msft rickle-msft self-assigned this Jul 19, 2019
@alzimmermsft
Copy link
Member Author

A broader discussion will need to occur given the URL encoding issues found within Files.

@alzimmermsft
Copy link
Member Author

Plan is to add documentation to the blobName setter methods in our builders to indicate that blob names need to be encoded.

If the encoding process is more complicated add in a helper method to perform the encoding and use that in the documentation.

@rickle-msft
Copy link
Contributor

Documentation should also go in the ContainerClient methods that construct any form of blob client.

We've used the internal Utility method in the past to special case /+/%20 to do the right thing. It has caused problems in the past with SAS signatures in the query parameters, but I can't say off hand whether this will be strictly necessary for the blob names. I tend to think not because we didn't internally encode in v8 and just let the customer do it themselves.

@alzimmermsft alzimmermsft assigned vcolin7 and unassigned rickle-msft Oct 14, 2019
@joshfree
Copy link
Member

@vcolin7 please update this issue when a PR is ready so that the encoding related changes can be reviewed

@joshfree joshfree changed the title URL Encoding for Blob Name URL Encoding to support Blob names with special characters Oct 15, 2019
@vcolin7
Copy link
Member

vcolin7 commented Oct 17, 2019

I tested a few options including the Utility method Rick mentioned and that one seemed to be the best one for UTF-8 encoding while converting spaces into %20.

For testing I used all printable and extended ASCII characters. I also used a bunch of different Unicode characters for languages such as Chinese, Russian, Korean, Greek, Japanese and even more obscure ones like Esperanto. I found no issues when trying to upload or download blobs with names containing these characters while using Utility.urlEncode() to encode their names.

I also discovered a couple interesting things:

  • Azure Storage does not seem to encode the following ASCII characters when manually generating a SAS URL !'()*-./\_~.
  • Our utility method does not encode the following ASCII characters: *-._.
  • Forward slashes / and backslashes \ are both used for establishing virtual hierarchies by the Azure Storage service, which will finally parse them to /.
  • ASCII characters not allowed on blob names while using the web UI are: # and %.
  • Uploading files that contain an unencoded percentage symbol % is also not allowed through the Azure SDK.
  • However, you can use the Azure SDK to upload blob names that start with a pound/hashtag (#) with no encoding. The Azure Service would create a blob with no name in such a case.

Even though we incur in some over-encoding for 7 ASCII characters, I don't believe it should cause any issues and I think we could implicitly do the encoding for the user instead of leaving that responsibility to them. I have already discussed this possibility with @joshfree and @alzimmermsft and both seem to agree. What do you think @rickle-msft?

@rickle-msft
Copy link
Contributor

Thanks for doing such a thorough investigation! Since it seems quite safe to encode the names ourselves, it seems to make sense that we can just do it for the customer.

@joshfree joshfree changed the title URL Encoding to support Blob names with special characters [bug] URL Encoding to support Blob names with special characters Oct 22, 2019
@joshfree
Copy link
Member

@vcolin7 can you give an update on this bug?

@vcolin7
Copy link
Member

vcolin7 commented Oct 23, 2019

Hi @joshfree, I have addressed the comments and concerns raised by @rickle-msft, made sure all tests ran and added some of my own and I'm just waiting for his or @gapra-msft / @jaschrep-msft approval. In the meantime I'm just going to solve some merge conflicts. Also, I'm still not sure if we need to send this PR to adpship as well.

It would also be good if you or @alzimmermsft could review the latest changes.

@vcolin7
Copy link
Member

vcolin7 commented Oct 23, 2019

Sent an email to adpship after getting Rick's approval.

@vcolin7
Copy link
Member

vcolin7 commented Oct 23, 2019

Merged to master after adpship approval.

@vcolin7 vcolin7 closed this as completed Oct 23, 2019
@joshfree
Copy link
Member

Thanks @vcolin7 for fixing this issue

@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
blocking-release Blocks release Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

5 participants