-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto load json cols #444
Auto load json cols #444
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #444 +/- ##
=======================================
Coverage ? 86.78%
=======================================
Files ? 93
Lines ? 9782
Branches ? 2023
=======================================
Hits ? 8489
Misses ? 936
Partials ? 357
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Looks like this will require a studio change also. Having trouble getting those tests to run, but I think the change will need to be here. |
Studio test failures should be covered by https://github.com/iterative/studio/pull/10656 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding this! I would also recommend that https://github.com/iterative/studio/pull/10656 is merged in quick succession to this PR, to avoid test failures in the Studio tests.
Thanks @dtulga! I will leave it to you and the team to merge this and the companion PR so we don't end up with broken tests. |
Deploying datachain-documentation with Cloudflare Pages
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you! 🙏
@dtulga are you going to handle this?
I am working on this, yes, although it appears we don't support this feature anymore, as for this feature to work, a column has to be marked as the
Which means that the code to convert the JSON string back into a dict is never called. (This was found while working on fixing the tests for this PR.) |
Oh my. This looks like it is a bigger issue now than it was before 😢 Should we create new GH issue for this and close these PRs? 🤔 |
That makes sense to me, but I'm not really sure what the plan was for this feature, or what the plan should be going forward. |
If it helps I moved Would the conversion be that This test currently passes: @pytest.mark.parametrize(
"cloud_type,version_aware",
[("s3", True)],
indirect=True,
)
def test_udf_different_types(cloud_test_catalog):
obj = {"name": "John", "age": 30}
def test_types():
return {"a": 1}
dc = (
DataChain.from_storage(
cloud_test_catalog.src_uri, session=cloud_test_catalog.session
)
.filter(C("file.path").glob("*cat1"))
.map(
test_types,
params=[],
output={
"dict_col": dict,
},
)
)
results = dc.select("dict_col").results()
assert results == [(json.dumps({"a": 1}),)] |
I looked into this a bit further. After merging
this does not pass without this change. |
That's what I had in mind. AFAIK we used to require SQL types like JSON as output types but now expect Python types like dict. |
Yep, looks like |
Extracted from #441.
Python dict values are converted to json columns but read back as strings instead of loading the json. This PR loads the json values before returning them so that values saved as dicts are returned as dicts.