Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: allow cast from jsonb to numbers and some jsonb type coercion #39495

Closed
andreimatei opened this issue Aug 9, 2019 · 9 comments
Closed
Labels
C-wishlist A wishlist feature. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. T-sql-queries SQL Queries Team

Comments

@andreimatei
Copy link
Contributor

andreimatei commented Aug 9, 2019

We allow casting string to int, but not jsonb to int. I think we should allow it (and error when the jsonb is not a trivial int).

This seems important to me because it takes away the need to understand the difference between the -> and ->> operators (the first returns jsonb, the former string). As things stand, I need to know about the ->> operator whenever I want to then further cast the value. And casting is needed if I'm ever going to join a number to some other column.

Which brings us to the the next peeve. It seems unfortunate that, if I want to get a number out of a jsonb, I need a cast at all. Perhaps our typing rules could allow comparisons of a jsonb with a numeric type (by doing an implicit cast). Is that feasible? I know we have some type coercion rules; would it be feasible to expand them in this way?
Once we start supporting JSON Schema for typing our JSON columns, presumably some of this becomes easier.

As a motivating example, here's a simple JSON and some hoops I've had to jump through to join it with a numeric column. Notice the ->> operator and the ::int cast. Together, they made JSONB a bit unfriendly.

{
  "zones":[
    {
      "zone_id":1,
    },
    {
      "zone_id":2,
    }
  ]
}

select zone_id, config_sql from 
   (select (jsonb_array_elements(report->'zones')->'zone_id')::int as zid from test) 
   inner join crdb_internal.zones on zid=zone_id;

cc @knz @andy-kimball

Jira issue: CRDB-5574

@andreimatei andreimatei added the C-wishlist A wishlist feature. label Aug 9, 2019
@awoods187 awoods187 added the S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. label Aug 9, 2019
@awoods187
Copy link
Contributor

I like this idea! We should continue to investigate here. cc @RaduBerinde

@jordanlewis
Copy link
Member

Does Postgres permit this? Such a change is not too difficult.

@jordanlewis
Copy link
Member

@awoods187 please check for us if you don't mind :)

@knz
Copy link
Contributor

knz commented Aug 10, 2019

Something to consider is that providing auto-casts will fool users into believing this is something they should do, whereas neither the CBO nor the execution machinery optimize well for JSONB scalars.

I personally like the fact that currently a CockroachDB user can reliably assume that if a cast is needed, chances are the query execution will be slow, and that instead they should seek to modify their SQL schema to ensure casts are not necessary.

@jordanlewis
Copy link
Member

Agreed. We can agree surely that implementing this as an explicit cast would be a good place to start! I'm generally happy with our current philosophy of providing implicit casts only when resolving constants - I don't think we need to consider changing that right now.

@andreimatei
Copy link
Contributor Author

I'm generally happy with our current philosophy of providing implicit casts only when resolving constants - I don't think we need to consider changing that right now.

One can argue that extracting fields from a JSON is analogous to untyped constants - you don't know the type you're getting, so it wouldn't be unreasonable to assume/coerce it to be what the context of its usage implies.

Something to consider is that providing auto-casts will fool users into believing this is something they should do, whereas neither the CBO nor the execution machinery optimize well for JSONB scalars.

You're referring to the fact that there's no support for joining the -> operator with the cast to avoid the creation of intermediary jsonb datums?
FWIW, in my opinion we should also prioritize this sort of optimization. JSON will eat the world!

@andy-kimball
Copy link
Contributor

I think @knz is pointing out that introducing implicit casts would mean we can't use an inverted index. For example, consider this hypothetical case (doesn't work today since JSON can't be compared with INT):

CREATE TABLE t (a INT PRIMARY KEY, b JSONB, INVERTED INDEX t_inv (b));
INSERT INTO t VALUES (1, '{"num": "2"}'), (2, '{"num": 2}');
SELECT * FROM t WHERE b->'num' = 2;

If we implicitly convert the JSON field into an INT, we would not be able to use the inverted index, because it can't return both the STRING and INT rows. They may be widely separated in the index, because they're stored using their native data types. Introducing an implicit cast would require the index be in a different order (i.e. ordered just as INT rather than STRING and INT).

That said, I think it'd be theoretically possible to cast the 2 into a JSON value (though there may be practical problems with doing that), and then do a lookup into the index. If so, then I'd expect:

-- Should return (1, '{"num": "2"}')
SELECT * FROM t WHERE b->'num' = "2"

-- Should return (2, '{"num": 2}')
SELECT * FROM t WHERE b->'num' = 2

Notice that there would be no way to return both rows at once using an equality condition, because we're purposefully preserving the data types of the JSON fields by not inserting a lossy cast on the left side of the equality. Instead we're inserting a non-lossy cast on the right side.

CC @justinj

@rafiss
Copy link
Collaborator

rafiss commented Feb 15, 2021

The support for explicit cast is covered by #41367

@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@jordanlewis
Copy link
Member

This is done now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-wishlist A wishlist feature. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. T-sql-queries SQL Queries Team
Projects
None yet
Development

No branches or pull requests

7 participants