Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect GROUP BY for JSON values #10467

Closed
breezewish opened this issue May 14, 2019 · 8 comments · Fixed by #21656
Closed

Incorrect GROUP BY for JSON values #10467

breezewish opened this issue May 14, 2019 · 8 comments · Fixed by #21656
Assignees
Labels
challenge-program component/expression component/json help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. severity/major sig/execution SIG execution type/bug The issue is confirmed as a bug.

Comments

@breezewish
Copy link
Member

breezewish commented May 14, 2019

Description

Bug Report

create table tx2 (col json);
insert into tx2 values (json_array(3.0));
insert into tx2 values (json_array(3));
select col, count(1) from tx2 group by col;

MySQL:

mysql> create table tx2 (col json);
Query OK, 0 rows affected (0.01 sec)

mysql> insert into tx2 values (json_array(3.0));
Query OK, 1 row affected (0.00 sec)

mysql> insert into tx2 values (json_array(3));
Query OK, 1 row affected (0.00 sec)

mysql> select col, count(1) from tx2 group by col;
+-------+----------+
| col   | count(1) |
+-------+----------+
| [3.0] |        2 |
+-------+----------+
1 row in set (0.00 sec)

TiDB:

mysql> create table tx2 (col json);
Query OK, 0 rows affected (0.13 sec)

mysql> insert into tx2 values (json_array(3.0));
Query OK, 1 row affected (0.02 sec)

mysql> insert into tx2 values (json_array(3));
Query OK, 1 row affected (0.01 sec)

mysql> select col, count(1) from tx2 group by col;
+------+----------+
| col  | count(1) |
+------+----------+
| [3]  |        1 |
| [3]  |        1 |
+------+----------+
2 rows in set (0.01 sec)

This simply indicates that generating the group key by using serialized value is incorrect. For JSON values, it is very clear that different memory / serialized values are treated as the same group.

SIG slack channel

#sig-exec

Score

  • 300

Mentor

@XuHuaiyu XuHuaiyu added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. type/bug The issue is confirmed as a bug. labels May 15, 2019
@ghost
Copy link

ghost commented Jul 13, 2020

Confirming this issue still exists in master:

drop table if exists tx2;
create table tx2 (col json);
insert into tx2 values (json_array(3.0));
insert into tx2 values (json_array(3));
select col, count(1) from tx2 group by col;

..

mysql> select col, count(1) from tx2 group by col;
+-----+----------+
| col | count(1) |
+-----+----------+
| [3] |        1 |
| [3] |        1 |
+-----+----------+
2 rows in set (0.00 sec)

mysql> select tidb_version()\G
*************************** 1. row ***************************
tidb_version(): Release Version: v4.0.0-beta.2-750-g8a661044c
Edition: Community
Git Commit Hash: 8a661044cedf8daad1de4fbf79a390962b6f6c3b
Git Branch: master
UTC Build Time: 2020-07-10 10:52:37
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
1 row in set (0.00 sec)

@morgo
Copy link
Contributor

morgo commented Dec 10, 2020

I think the best way to fix this is to add the JSON type of decimal (see: #9988 ).

There was a proposed PR for this which used float64 to compare, but that produces inaccurate results for some cases. If when grouping JSON values decimal is used, it should be safe for all cases.

@ti-srebot
Copy link
Contributor

Please edit this comment or add a new comment to complete the following information

Not a bug

  1. Remove the 'type/bug' label
  2. Add notes to indicate why it is not a bug

Duplicate bug

  1. Add the 'type/duplicate' label
  2. Add the link to the original bug

Bug

Note: Make Sure that 'component', and 'severity' labels are added
Example for how to fill out the template: #20100

1. Root Cause Analysis (RCA) (optional)

2. Symptom (optional)

3. All Trigger Conditions (optional)

4. Workaround (optional)

5. Affected versions

6. Fixed versions

@morgo
Copy link
Contributor

morgo commented Dec 14, 2020

Re-opening for TiKV investigation.

@XuHuaiyu
Copy link
Contributor

What does for TiKV investigation mean?

@XuHuaiyu
Copy link
Contributor

If there is any development task that needs to be done,
I think we can create a new issue for it but not keep the bug issue opened.

@morgo
Copy link
Contributor

morgo commented Dec 14, 2020

Sounds good to me. I will reclose.

@morgo morgo closed this as completed Dec 14, 2020
@breezewish
Copy link
Member Author

If there is any development task that needs to be done,
I think we can create a new issue for it but not keep the bug issue opened.

I think while this PR is closing a bug, it opens another one, due to not identical implementations between TiDB and TiKV. The fail case could be constructed. This might not be considered as a complete fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
challenge-program component/expression component/json help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. severity/major sig/execution SIG execution type/bug The issue is confirmed as a bug.
Projects
None yet
7 participants