-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][FS][Azure] Consider how and if to fix disallowed characters in file metadata #40057
Comments
I think probably this blocks #39069 |
We should probably NOT USE an encoding that would make the keys look totally opaque (like base64). A custom escaper that replaces |
I like the idea of disalowing |
So, the question is: which convention do other Azure clients use? What is the recommended or official way to set a file's content-type? Edit: it looks like using the corresponding HTTP headers is recommended when possible. This probably requires a conversion layer of some kind, as done for S3. |
This is a good point. If we only care about setting specific things like |
Resolved by #40671. |
Describe the enhancement requested
Child of #18014
Manually modifying #40021 to run the Python tests against a real blob storage account caused a failure when attempting to write the metadata
{'Content-Type': 'x-pyarrow/test'}
. Looking at the Generic C++ tests it looks like we will run into the same issue therearrow/cpp/src/arrow/filesystem/test_util.cc
Lines 912 to 913 in c23a097
It turns out real Azure storage doesn't allow
-
s in the metadata keys. This behaviour is the same on flat and hierarchical namespace storage accounts but azurite accepts it without error.Apparently the keys (names) of metadata must conform to https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata#metadata-names the naming rules for C# identifiers.
We need to decide what we want to do here. I think either we have to accept these limitations or we will need to encode the metadata keys before writing them to azure then decode when reading back. A quick Google search came up with these options for potential encodings https://stackoverflow.com/questions/32037525/encode-to-alphanumeric-in-javascript#:~:text=To%20encode%20to%20an%20alphanumeric,in%20a%20shorter%20encoded%20string.
The downside of encoding the metadata would be that other Azure clients won't know to decode.
Component(s)
C++
The text was updated successfully, but these errors were encountered: