-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: JSON data types #782
Comments
Good question! What are you trying to do with the data? I haven't thought through how we could incorporate schema-less JSON into Ibis's data model. For nested data in Hive, Impala, Spark SQL, etc., it's a little more clear what should be done. |
FWIW, Presto has a select json '[a"]' fails because it can't be parsed as valid json, whereas select json '["a"]' is converted to json. |
Additionally Presto uses a subset of JSONPath to let users pick out parts of the JSON blob that they want. An ibis-worthy API might look something like this: t.blob_of_json.extract('$.foo.bar')
t.blob_of_json.extract('$.foo.bar', type='int64')
t.blob_of_json.extract('$.foo').cast('array<int64>') |
We could also leverage jsonpath-rw which is a Python DSL that generates JSONPath expressions. |
Should we limit the scope here? E.g., arbitrary JSON is out of scope, but JSON with a well-defined, uniform schema per record (including arbitrary nesting) is in scope. The Apache Drill data model page is an excellent read and IMO a nice indication of reasonable expectations to set for JSON functionality. @wesm thoughts? |
Makes sense to me. I think one of the challenges will be supporting nested data generally given the diverging semantics of different SQL engines (for example: comparing Hive, Presto, and Impala). |
For the record it looks like each of the systems mentioned by @wesm have three different ways of unnesting for arrays (similar for maps): Given a table CREATE TABLE t (
elements ARRAY<STRING>
) Hive: SELECT element
FROM t
LATERAL VIEW explode(elements) s AS element; Presto: SELECT s.element
FROM t
CROSS JOIN unnest(elements) AS s (element) Impala: SELECT elements.item
FROM t, t.elements I think we can come up with a reasonable API for doing these operations. |
Cool thanks for looking into this. Might also be good to look at BigQuery for another data point. From the Ibis perspective, it might make the most sense to model this explicitly as a join. Expression analysis/validation may well become more complex, though. |
BigQuery: SELECT element
FROM t.elements or SELECT element
FROM UNNEST (t.elements) The |
I'll make a new issue for the array API. I've started work on it by way of getting the type system in place and also added an "array_agg" function (called |
Just upvoting that JSON/JSONB support would be a nice feature. |
We have JSON and JSONB types now, and with 3e2efb4 we have a minimal API for extracting object values or array elements. The remaining work is implement feature requests for additional APIs, so I'm going to close this out as completed. |
Hi, I was wondering what would be the best way to treat JSON/JSONB columns from Postgres in Ibis. Currently I'm trying to treat it as a string (although it gets converted to a
dict
whenever I send the data to a pandas dataframe), but I'm not sure if that's the right solution.The text was updated successfully, but these errors were encountered: