Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix operator trace caching #5580

Merged
merged 3 commits into from
Jul 26, 2024
Merged

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Jul 25, 2024

Category:

Bug fix (non-breaking change which fixes an issue)

Description:

Operator traces have an optimization that caches the trace map for the current executor. This was done with an assumption that the access function is not used for reading trace values for multiple operators - and in C API it was used exactly this way, causing errors.
This PR fixes the problem and adds tests that didn't pass prior to the fix.
Additionally, AddArg overload is added to OpSpec, making it possible to pass const char *. This was bugged, causing weird run-time errors when attempting to add an argument this way.

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

mzient added 2 commits July 25, 2024 19:22
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
}


private:
std::string trace_name_;
Copy link
Contributor Author

@mzient mzient Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was passing, because the trace name was the same in all operators - now it's configurable per-operator.

const char *operator_name = operator_under_test_names[op];
const char *trace_name = operator_trace_names[op];
EXPECT_EQ(daliHasOperatorTrace(&h, operator_name, "this_trace_does_not_exist"), 0);
ASSERT_NE(daliHasOperatorTrace(&h, operator_name, trace_name), 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test was passing, because the trace name was the same in all operators. Now there are different trace names so we know that we're looking at the right operator's trace.

if (!operator_traces_)
operator_traces_ = &iter_data_->operator_traces.Get(operator_name);
return *operator_traces_;
return iter_data_->operator_traces.Get(operator_name);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actual fix: We must query the map directly here.

Comment on lines +599 to +600
if (!operator_traces_)
operator_traces_ = &iter_data_->operator_traces.Get(operator_instance_name_);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix: The optimization should go here, not in the query by operator name.

@@ -186,11 +195,13 @@ class OperatorTraceTestExternalInput : public OperatorTraceTest {
pipeline_->AddOperator(OpSpec("PassthroughWithTraceOp")
.AddArg("device", "cpu")
.AddInput("OP_TRACE_IN_CPU", "cpu")
.AddArg("trace_name", operator_trace_names[0])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caused a run-time (not compile-time!) error without the OpSpec fix.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [16923093]: BUILD STARTED

@@ -52,8 +52,12 @@ OperatorTraceTestParam operator_trace_test_params_pipelined_executor_separate_qu
{2, 2, true, false},
};

std::array<std::string, 2> operator_under_test_names = {
"PassthroughWithTraceOpCpu", "PassthroughWithTraceOpGpu"
const char *operator_under_test_names[] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really want std::string for C-API tests.

@@ -174,6 +174,11 @@ class DLL_PUBLIC OpSpec {
return this->SetArg<std::string>(name, c_str);
}

// Forward to string implementation
DLL_PUBLIC inline OpSpec& SetArg(const string &name, const char *c_str) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this overload, passing const char * (but not a literal, which is properly handled above) resulted in a run-time error - there was an attempt to set a pointer value to the argument, which was a bug.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@mzient mzient force-pushed the fix_operator_trace_query branch from a46e996 to 12525af Compare July 25, 2024 19:53
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [16923393]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [16923393]: BUILD PASSED

@mzient mzient merged commit de81944 into NVIDIA:main Jul 26, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants