Performance improvement in JSON Tree traversal #11919

karthikeyann · 2022-10-14T09:29:23Z

Description

This PR improves performance of JSON Tree traversal - mainly in creation of column id.

Replaced per-level processing with two-level hash algorithm
Reduced memory usage for hash map (reduced oversubscription)

Other changes are

Fail if tokens has error token in tree generation
Created device_span version of device_parse_nested_json

Hits 2 GB/s in GV100 from 128MB json.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

reduces memory usage by 35% (1GB json takes 10.951GB instead of 16.957GB)

reduce peak memory usage (not total memory used) reorder node_range, node_cat, scope limit token_levels 10.957 GiB -> 9.91 GiB -> 9.774 GiB -> 9.403 GiB

9.403 GiB to 8.487 GiB (for 1GB json input)

… error

…-reduce-memory1

This reverts commit 5eefd64.

…-reduce-memory1

1. use insert_and_find to insert key, and update to inserted unique keys 2. get this unique keys, sort, and use lower_bound to translate all keys to non-gap sequence

…-perf-treetraversal1

karthikeyann · 2022-10-26T16:30:18Z

rerun tests

…-perf-treetraversal1

ttnghia · 2022-10-26T21:20:17Z

Something seems to be wrong with the new test. Did you try running it locally?

karthikeyann · 2022-10-27T07:45:25Z

Did you try running it locally?

Yes. I did. Local run works.

01:50:54 NVIDIA/thrust#1 in /workspace/cpp/tests/utilities/identify_stream_usage/build/libidentify_stream_usage.so : cudaMemcpyAsync()+0x3b
This error is completely new to me. Any clue on what this is?

upsj · 2022-10-27T08:21:58Z

@karthikeyann this is the new error from #11875, checking that you use the provided stream everywhere. Seems like there is still a default stream lurking somewhere. By LD_PRELOADing libidentify_stream_usage.so, you should be able to catch the issue with a catchpoint in gdb (catch throw) or maybe even identify the location from the backtrace.

upsj

LGTM except for the stream issues

upsj · 2022-10-26T19:09:46Z

cpp/tests/io/json_tree.cpp

+  // Get the JSON's tree representation
+  CUDF_EXPECT_THROW_MESSAGE(
+    cuio_json::detail::get_tree_representation(tokens_gpu, token_indices_gpu, stream),
+    "JSON Parser encountered an invalid format at location 6");


nit: Maybe we should be using terms we also use inside the algorithm?

Suggested change

"JSON Parser encountered an invalid format at location 6");

"JSON Parser encountered an invalid token at offset 6");

I meant to give more useful error for user to look into the text input. I would prefer to give information on the user input text location instead of internal tokens and offsets, so that the user can check the input for errors.

upsj · 2022-10-26T19:14:25Z

cpp/src/io/json/json_tree.cu

+     uq_node_id = col_id.begin()] __device__(auto node_id) mutable {
+      // typename hash_map_type::value_type const insert_pair{};
+      auto it = view.insert_and_find(cuco::make_pair(node_id, node_id), d_hashed_cache, d_equal);
+      uq_node_id[node_id] = (it.first)->first;  // first.load(cuda::std::memory_order_relaxed);


Just to make sure: What is the purpose of this comment? I would assume cuco ensures that the writes to (it.first)->first cannot be reordered/observed after the completion of insert_and_find?

@PointKernel suggested to use cuda::std::memory_order_relaxed while using find at
https://github.com/karthikeyann/cudf/blob/ed1a3d5606991e23e3640df734dff1ba3ffabfd4/cpp/src/io/json/json_tree.cu#L446-L447

auto const it = key_map.find(node_id, d_hasher, d_equal); return (it == key_map.end()) ? size_type{0} : it->second.load(cuda::std::memory_order_relaxed);

@PointKernel does it make sense to use it here?

Yes, the memory order doesn't matter here so we can use the least expensive one (which is the relaxed memory order).

cpp/src/io/json/json_column.cu

cpp/src/io/json/json_tree.cu

karthikeyann · 2022-10-27T15:03:30Z

rerun tests

karthikeyann · 2022-10-27T17:45:29Z

rerun tests

…-perf-treetraversal1

karthikeyann · 2022-10-28T16:57:27Z

@gpucibot merge

karthikeyann added 29 commits September 30, 2022 13:08

fix the right condition for parent_node propagation initial condition

a75b0a5

parent_node_id generation using only nodes instead of tokens

4abfb51

reduces memory usage by 35% (1GB json takes 10.951GB instead of 16.957GB)

reduce node_ids memory (not impacting peak memory)

efb6621

reorder node_range, node_cat, scope limit token_levels

5f250cb

reduce peak memory usage (not total memory used) reorder node_range, node_cat, scope limit token_levels 10.957 GiB -> 9.91 GiB -> 9.774 GiB -> 9.403 GiB

use cub SortPairs to reduce memory

49cb0d7

9.403 GiB to 8.487 GiB (for 1GB json input)

reduce memory by cub::DoubleBuffer, scope limit token_id_for_nodes

02a7b5b

cleanup

9243d89

reorganize parent_node_ids algorithm (generic logical stack)

7efc890

include CUDF_PUSH_RANGE, CUDF_POP_RANGE nvtx macros

6d3a166

replace TreeDepthT with size_type due to cuda Invalid Device function…

bbcbffa

… error

Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…

483abf1

…-reduce-memory1

update docs

f9f0926

remove nvtx range macros and debug prints

f851232

remove nvtx macros

55369c9

NVTX RANGES macros commit

5eefd64

Revert "NVTX RANGES macros commit"

3bb54f4

This reverts commit 5eefd64.

Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…

b70669d

…-reduce-memory1

address review comments (upsj)

5a0a9a7

Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…

8578a22

…-reduce-memory1

create device_parse_nested_json2(device_span,)

952a6ed

error out on errortoken location in tree construction

cc573dc

reduces oversubscription field hash_map

4b4e337

new two-level hashing method for col_id generation

ce0bb99

num_fields calculation fix bug

76d60f0

update algorithm with insert_and_find and binary_search

0d0f7b9

1. use insert_and_find to insert key, and update to inserted unique keys 2. get this unique keys, sort, and use lower_bound to translate all keys to non-gap sequence

cleanup

f864131

Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…

5d8cf31

…-perf-treetraversal1

include cleanup

f923150

remove old generate_column_id code

11d41e9

karthikeyann added the 2 - In Progress Currently a work in progress label Oct 14, 2022

karthikeyann added 2 commits October 26, 2022 18:08

auto const almost everything

f36496e

algorithm details as comments to non-doxygen style

e17a9b2

karthikeyann requested review from upsj, ttnghia and vyasr October 26, 2022 12:44

karthikeyann added 2 commits October 27, 2022 00:01

Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…

ed1a3d5

…-perf-treetraversal1

merge fix get_default_stream()

8dd3e37

add const to variables, rename test

acfbad1

upsj reviewed Oct 27, 2022

View reviewed changes

upsj mentioned this pull request Nov 8, 2023

Provide stream-aware containers NVIDIA/cccl#823

Open

replace device_ptr with cudaMemcpy on stream

a543c32

upsj approved these changes Oct 27, 2022

View reviewed changes

ttnghia approved these changes Oct 27, 2022

View reviewed changes

Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…

a22b7cc

…-perf-treetraversal1

karthikeyann added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs Review Waiting for reviewer to review or respond 4 - Needs cuIO Reviewer 3 - Ready for Review Ready for review by team labels Oct 27, 2022

use memory order relaxed load in hash key read

f3f5c28

ttnghia approved these changes Oct 28, 2022

View reviewed changes

rapids-bot bot merged commit aaf251d into rapidsai:branch-22.12 Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement in JSON Tree traversal #11919

Performance improvement in JSON Tree traversal #11919

karthikeyann commented Oct 14, 2022 •

edited

Loading

karthikeyann commented Oct 26, 2022

ttnghia commented Oct 26, 2022

karthikeyann commented Oct 27, 2022 •

edited

Loading

upsj commented Oct 27, 2022

upsj left a comment

upsj Oct 26, 2022

karthikeyann Oct 27, 2022

upsj Oct 26, 2022

karthikeyann Oct 27, 2022

PointKernel Oct 27, 2022

karthikeyann commented Oct 27, 2022

karthikeyann commented Oct 27, 2022

karthikeyann commented Oct 28, 2022

	"JSON Parser encountered an invalid format at location 6");
	"JSON Parser encountered an invalid token at offset 6");

Performance improvement in JSON Tree traversal #11919

Performance improvement in JSON Tree traversal #11919

Conversation

karthikeyann commented Oct 14, 2022 • edited Loading

Description

Checklist

karthikeyann commented Oct 26, 2022

ttnghia commented Oct 26, 2022

karthikeyann commented Oct 27, 2022 • edited Loading

upsj commented Oct 27, 2022

upsj left a comment

Choose a reason for hiding this comment

upsj Oct 26, 2022

Choose a reason for hiding this comment

karthikeyann Oct 27, 2022

Choose a reason for hiding this comment

upsj Oct 26, 2022

Choose a reason for hiding this comment

karthikeyann Oct 27, 2022

Choose a reason for hiding this comment

PointKernel Oct 27, 2022

Choose a reason for hiding this comment

karthikeyann commented Oct 27, 2022

karthikeyann commented Oct 27, 2022

karthikeyann commented Oct 28, 2022

karthikeyann commented Oct 14, 2022 •

edited

Loading

karthikeyann commented Oct 27, 2022 •

edited

Loading