-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement in JSON Tree traversal #11919
Performance improvement in JSON Tree traversal #11919
Conversation
reduces memory usage by 35% (1GB json takes 10.951GB instead of 16.957GB)
reduce peak memory usage (not total memory used) reorder node_range, node_cat, scope limit token_levels 10.957 GiB -> 9.91 GiB -> 9.774 GiB -> 9.403 GiB
9.403 GiB to 8.487 GiB (for 1GB json input)
This reverts commit 5eefd64.
1. use insert_and_find to insert key, and update to inserted unique keys 2. get this unique keys, sort, and use lower_bound to translate all keys to non-gap sequence
…-perf-treetraversal1
rerun tests |
Something seems to be wrong with the new test. Did you try running it locally? |
Yes. I did. Local run works.
|
@karthikeyann this is the new error from #11875, checking that you use the provided stream everywhere. Seems like there is still a default stream lurking somewhere. By |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for the stream issues
// Get the JSON's tree representation | ||
CUDF_EXPECT_THROW_MESSAGE( | ||
cuio_json::detail::get_tree_representation(tokens_gpu, token_indices_gpu, stream), | ||
"JSON Parser encountered an invalid format at location 6"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe we should be using terms we also use inside the algorithm?
"JSON Parser encountered an invalid format at location 6"); | |
"JSON Parser encountered an invalid token at offset 6"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to give more useful error for user to look into the text input. I would prefer to give information on the user input text location instead of internal tokens and offsets, so that the user can check the input for errors.
cpp/src/io/json/json_tree.cu
Outdated
uq_node_id = col_id.begin()] __device__(auto node_id) mutable { | ||
// typename hash_map_type::value_type const insert_pair{}; | ||
auto it = view.insert_and_find(cuco::make_pair(node_id, node_id), d_hashed_cache, d_equal); | ||
uq_node_id[node_id] = (it.first)->first; // first.load(cuda::std::memory_order_relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure: What is the purpose of this comment? I would assume cuco ensures that the writes to (it.first)->first
cannot be reordered/observed after the completion of insert_and_find
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PointKernel suggested to use cuda::std::memory_order_relaxed
while using find
at
https://github.com/karthikeyann/cudf/blob/ed1a3d5606991e23e3640df734dff1ba3ffabfd4/cpp/src/io/json/json_tree.cu#L446-L447
auto const it = key_map.find(node_id, d_hasher, d_equal);
return (it == key_map.end()) ? size_type{0} : it->second.load(cuda::std::memory_order_relaxed);
@PointKernel does it make sense to use it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the memory order doesn't matter here so we can use the least expensive one (which is the relaxed memory order).
rerun tests |
1 similar comment
rerun tests |
…-perf-treetraversal1
@gpucibot merge |
Description
This PR improves performance of JSON Tree traversal - mainly in creation of column id.
Other changes are
Hits 2 GB/s in GV100 from 128MB json.
Checklist