Skip to content

[Data] partition_cols field is url encoded but not decoded when read back #57564

@lucaschadwicklam97

Description

@lucaschadwicklam97

What happened + What you expected to happen

When I specify a partition_col in ray.data.read_parquet, the partition column gets url encoded, which makes sense.

Expected behavior is that when I read it back, the values should be decoded accordingly, but it is not

Versions / Dependencies

Ray version: 2.49.2

Reproduction script

import ray
from pathlib import Path

data = [
    {"column_a": "string/with/slashes", "column_b": "hi", "column_c": 1},
    {"column_a": "string/with/slashes2", "column_b": "hi2", "column_c": 2},
]

ds = ray.data.from_items(data)

out_dir = Path.cwd() / "test_directory"

ds.write_parquet("file://" + str(out_dir), partition_cols=["column_a"])

data = ray.data.read_parquet("file://" + str(out_dir))

print("this is data: ", data.take(2))

assert (
    data.take(1)["column_a"] == "string/with/slashes"
), "String value not decoded properly"

std out

this is data:  [{'column_b': 'hi', 'column_c': 1, 'column_a': 'string%2Fwith%2Fslashes'}, {'column_b': 'hi', 'column_c': 1, 'column_a': 'string%2Fwith%2Fslashes'}]
...
AssertionError: String value not decoded properly

Issue Severity

Low: It annoys or frustrates me.

Metadata

Metadata

Labels

P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tdataRay Data-related issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions