Skip to content

Conversation

@hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Dec 2, 2024

bp #43469

Problem Summary:
Support reading json format hive table like:

mysql> show create table basic_json_table;
CREATE TABLE `basic_json_table`(
  `id` int,
  `name` string,
  `age` tinyint,
  `salary` float,
  `is_active` boolean,
  `join_date` date,
  `last_login` timestamp,
  `height` double,
  `profile` binary,
  `rating` decimal(10,2))
ROW FORMAT SERDE
  'org.apache.hive.hcatalog.data.JsonSerDe'

Behavior changed:
To implement this feature, this pr modifies new_json_reader. Previously, new_json_reader could only insert data into columnString. In order to support inserting data into columns of other types, DataTypeSerDe is introduced to insert data into columns. To maintain compatibility with previous versions, changes to this pr are triggered only when reading hive json tables.

Limitation of Use:

  1. Currently, only query is supported, and writing is not supported.
  2. Currently, only the ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'; scenario is supported. For some properties specified in with serdeproperties, Doris does not take effect.
  3. Since Hive does not allow columns with the same name but different case when creating a table in Json format (including inside a Struct), we convert the field names in the Json data to lowercase when reading the Json data file, and then match according to the lowercase field names. For field names that are duplicated after being converted to lowercase in the data, the value of the last field is used (consistent with Hive behavior).
    example:
create table json_table(
    column int
)ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';

a.json:
{"column":1,"COLumn",2,"COLUMN":3}
{"column":10,"COLumn",20}
{"column":100}
in Hive : load a.json to table json_table

in Doris query:
---
3
20
100
---

Todo(in next pr):
Merge serde and json_reader ,because they have logical conflicts.

Hive catalog support read json format table.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

hubgeter commented Dec 2, 2024

run buildall

Problem Summary:
Support reading json format hive table like:
```mysql
mysql> show create table basic_json_table;
CREATE TABLE `basic_json_table`(
  `id` int,
  `name` string,
  `age` tinyint,
  `salary` float,
  `is_active` boolean,
  `join_date` date,
  `last_login` timestamp,
  `height` double,
  `profile` binary,
  `rating` decimal(10,2))
ROW FORMAT SERDE
  'org.apache.hive.hcatalog.data.JsonSerDe'
```

Behavior changed:
To implement this feature, this pr modifies `new_json_reader`.
Previously, `new_json_reader` could only insert data into columnString.
In order to support inserting data into columns of other types,
`DataTypeSerDe` is introduced to insert data into columns. To maintain
compatibility with previous versions, changes to this pr are triggered
only when reading hive json tables.

Limitation of Use:
1. Currently, only query is supported, and writing is not supported.
2. Currently, only the `ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe';` scenario is supported. For
some properties specified in `with serdeproperties`, Doris does not take
effect.
3. Since Hive does not allow columns with the same name but different
case when creating a table in Json format (including inside a Struct),
we convert the field names in the Json data to lowercase when reading
the Json data file, and then match according to the lowercase field
names. For field names that are duplicated after being converted to
lowercase in the data, the value of the last field is used (consistent
with Hive behavior).
example:
```
create table json_table(
    column int
)ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';

a.json:
{"column":1,"COLumn",2,"COLUMN":3}
{"column":10,"COLumn",20}
{"column":100}
in Hive : load a.json to table json_table

in Doris query:
---
3
20
100
---
```

Todo(in next pr):
Merge `serde` and `json_reader` ,because they have logical conflicts.

Hive catalog support read json format table.
@hubgeter hubgeter force-pushed the pick_30_feature_hive_json_serde branch from cd62ee5 to a21926b Compare December 4, 2024 08:43
@hubgeter
Copy link
Contributor Author

hubgeter commented Dec 4, 2024

run buildall

@morningman morningman merged commit 8efb98e into apache:branch-3.0 Dec 4, 2024
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants