Add event tracing and ETDumps to executor_runner #5027

benkli01 · 2024-09-02T11:35:21Z

Enabled via EXECUTORCH_ENABLE_EVENT_TRACER
Add flag 'etdump_path' to specify the file path for the ETDump file
Add flag 'num_executions' for number of iterations to run
Create and pass event tracer 'ETDumpGen'
Save ETDump to disk
Update docs to reflect the changes

Re-upload of #4502 to discuss with @GregoryComer.

pytorch-bot · 2024-09-02T11:35:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5027

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 31 New Failures

As of commit fde5862 with merge base 01d526f ():

NEW FAILURES - The following jobs have failed:

Build documentation / build (buck2) / Build doc (gh)
examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
Lint / lintrunner / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-custom-ops-linux / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-eval_llama-mmlu-linux / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-eval_llama-wikitext-linux / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-llama_runner_eager-linux / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-llama-runner-linux (bf16, custom) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-llama-runner-linux (bf16, portable) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-llama-runner-linux (fp32, portable) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-llama-runner-linux (fp32, xnnpack+custom) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-llama-runner-linux (fp32, xnnpack+custom+qe) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-llama-runner-linux (fp32, xnnpack+custom+quantize_kv) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-llama-runner-linux (fp32, xnnpack+quantize_kv) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-llama-runner-linux-android / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-llama-runner-qnn-linux (fp32, qnn_8a8w, qnn) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-llava-runner-linux / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-models-linux (buck2, mv3, portable, linux.2xlarge, 90) / linux-job (gh)
examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-models-linux (buck2, mv3, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh)
examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-models-linux (cmake, mv3, portable, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-models-linux (cmake, mv3, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-models-linux (cmake, vit, portable, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-models-linux (cmake, vit, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-phi-3-mini-runner-linux / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-pybind-build-linux / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-quantized-aot-lib-linux / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-selective-build-linux / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / test-setup-linux-gcc / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:95:9: error: ‘result’ was not declared in this scope
pull / unittest / linux / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / unittest / macos / macos-job (gh)
/Users/ec2-user/runner/_work/executorch/executorch/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'
pull / unittest-arm / linux-job (gh)
/pytorch/executorch/examples/portable/executor_runner/executor_runner.cpp:94:22: error: no type named 'etdump_result' in namespace 'torch::executor'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

benkli01 · 2024-09-02T11:35:54Z

@pytorchbot label 'partner: arm'

benkli01 · 2024-09-02T11:37:38Z

@pytorchbot label ciflow/trunk

pytorch-bot · 2024-09-02T11:37:43Z

Can't add following labels to PR: ciflow/trunk. Please ping one of the reviewers for help.

benkli01 · 2024-09-04T14:29:36Z

Hi @GregoryComer. Would it be possible to run the CI on your side to see if the issue from the previous PR is still occurring? I'm having a hard time understanding where this comes from.

facebook-github-bot · 2024-09-10T02:30:56Z

@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-12T16:08:50Z

@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-09-17T14:18:31Z

@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

- Enabled via EXECUTORCH_ENABLE_EVENT_TRACER - Add flag 'etdump_path' to specify the file path for the ETDump file - Add flag 'num_executions' for number of iterations to run - Create and pass event tracer 'ETDumpGen' - Save ETDump to disk - Update docs to reflect the changes Signed-off-by: Benjamin Klimczak <benjamin.klimczak@arm.com> Change-Id: I7e8e8b7f21453bb8d88fa2b9c2ef66c532f3ea46

benkli01 · 2024-09-23T15:09:01Z

Hi @dbort . Sorry for dragging you into this, but I saw your comment on EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT in the code, so I thought you might be able to help with resolving the failing test here. Any idea how to fix this?

digantdesai · 2024-09-23T15:13:23Z

I don't see a CI failure anymore

benkli01 · 2024-09-23T15:17:55Z

I don't see a CI failure anymore

@digantdesai To me pull / test-llama-runner-qnn-linux (fp32, cmake, qnn) / linux-job (pull_request) is showing up as failing after my latest update. The CI run for the previous version you imported did not finish for me, i.e. I could not see any results, but it did not seem to have this test included anyway.

facebook-github-bot · 2024-09-24T18:54:01Z

@digantdesai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

digantdesai · 2024-09-24T18:55:37Z

Yeah I see, from the main CMakeList, and qnn does have EXECUTORCH_ENABLE_EVENT_TRACER=ON

digantdesai · 2024-09-30T15:08:39Z

Any update on this?

benkli01 · 2024-09-30T16:11:19Z

Hi @digantdesai, I'm still hoping for some pointer from @dbort or you as I'm struggling to reproduce it locally and can't really make sense of the error.

freddan80 · 2024-11-29T07:41:27Z

@digantdesai will you have a look at this one since it touches code outside arm delegate. Thx!

cccclai · 2024-11-30T18:42:55Z

The error shows up when running this script https://github.com/pytorch/executorch/blob/main/backends/qualcomm/scripts/build.sh based on the log.

If you have a linux machine, can you follow https://pytorch.org/executorch/stable/build-run-qualcomm-ai-engine-direct-backend.html and see if the script fails?

benkli01 · 2024-12-02T16:42:55Z

@cccclai I finally managed to reproduce the issue by running script backends/qualcomm/scripts/build.sh with the parameters from the CI script here. Interestingly, the issue seems to be caused by --job_number 2 (not my first guess). If I remove the parameter entirely, defaulting to --job_number 16, the issue disappears (not sure this is an acceptable solution and/or would work in the CI). I'm guessing that this is related to the TODO here. Any input on how to proceed would be much appreciated.

cccclai · 2024-12-04T17:17:11Z

@cccclai I finally managed to reproduce the issue by running script backends/qualcomm/scripts/build.sh with the parameters from the CI script here. Interestingly, the issue seems to be caused by --job_number 2 (not my first guess). If I remove the parameter entirely, defaulting to --job_number 16, the issue disappears (not sure this is an acceptable solution and/or would work in the CI). I'm guessing that this is related to the TODO here. Any input on how to proceed would be much appreciated.

Ah I remember that. @dbort and @Olivia-liu, any thought on this?

dbort

Sorry that I missed your mentions of me for so long! Thanks @cccclai for pointing me to this.

Just to double check, is this the error you're seeing?

https://github.com/pytorch/executorch/actions/runs/12161392244/job/33915905755#step:14:1039

gmake[2]: *** No rule to make target '../third-party/flatcc/lib/libflatccrt.a', needed by 'executor_runner'.  Stop.

I think I remember @Olivia-liu looking into this, and worked around it by building in release mode. (hence

executorch/examples/devtools/build_example_runner.sh

Line 44 in 8861b9a

-DCMAKE_BUILD_TYPE=Release \

) Which is still something we need to figure out. Olivia, do you know if there was a github issue tracking this problem?

dbort · 2024-12-05T01:19:57Z

examples/portable/executor_runner/executor_runner.cpp

-
-  Result<Method> method = program->load_method(method_name, &memory_manager);
+  EventTracer* event_tracer_ptr = nullptr;
+#ifdef ET_EVENT_TRACER_ENABLED


This function is already so long and complex, I'd like to factor out these ifdefs if possible.

You could create a class to encapsulate the event tracing, like

class TraceManager { public: TraceManager(); EventTracer* get_event_tracer(); Error write_etdump_to_file(const char* filename); };

If tracing is enabled, the ctor could create the ETDump (a field), get_event_tracer can return a pointer to it, and write_to_file can open the file and write the contents. If tracing is disabled, the class is basically empty, returning a null tracer and just returning Error::NotSupported when asked to write.

Then main() can say

TraceManager tracer; program->load_method(..., tracer->get_event_tracer()); ... if (tracer->get_event_tracer() != nullptr) { status = tracer->write_etdump_to_file(FLAGS_etdump_path.c_str()); ET_CHECK_MSG(status == Error::Ok, ...); }

Good idea. Let me know if the implementation looks ok.

examples/portable/executor_runner/executor_runner.cpp

backends/xnnpack/CMakeLists.txt

CMakeLists.txt

docs/source/tutorial-xnnpack-delegate-lowering.md

dbort · 2024-12-05T01:53:58Z

Also for what it's worth, I'm trying to merge dvidelabs/flatcc#306 into upstream flatcc to let us remove -DEXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT and similar hacks

benkli01 · 2024-12-05T16:33:51Z

gmake[2]: *** No rule to make target '../third-party/flatcc/lib/libflatccrt.a', needed by 'executor_runner'.  Stop.

Thanks @dbort , this is the error exactly.

I'm not sure about the workaround to use release mode. The command used in the ci script is already using --release. As mentioned above, removing --num_jobs 2 seems to work locally, but it's a strange workaround and might not work in CI.

dbort · 2024-12-07T01:34:32Z

I'm not sure about the workaround to use release mode. The command used in the ci script is already using --release.

Ok, thanks for looking into that.

As mentioned above, removing --num_jobs 2 seems to work locally, but it's a strange workaround and might not work in CI.

That means that there's some kind of race condition.

Based on your PR, it looks like executor_runner has a proper dependency on libflatccrt; otherwise I would have expected a use-without-dependency situation.

...except maybe it doesn't. This PR adds a dep on "${FLATCCRT_LIB}" from the top-level CMakeLists.txt, but when I look for a place that sets FLATCCRT_LIB I only see it in the cmake config file at

executorch/build/executorch-config.cmake

Lines 51 to 55 in 538bfaf

    
           if(CMAKE_BUILD_TYPE MATCHES "Debug") 
        
             set(FLATCCRT_LIB flatccrt_d) 
        
           else() 
        
             set(FLATCCRT_LIB flatccrt) 
        
           endif()

afaik, that config isn't included in the top-level cmake system. That file is used to point to an already-built version of the core ET libs from external projects, like

executorch/examples/devtools/CMakeLists.txt

Line 39 in 538bfaf

find_package(executorch CONFIG REQUIRED)

If FLATCCRT_LIB is empty in this PR at https://github.com/pytorch/executorch/pull/5027/files#diff-1e7de1ae2d059d21e1dd75d5812d5a34b0222cef273b7c3a2af62eb747f9d20aR817-R819

list(APPEND _executor_runner_libs etdump ${FLATCCRT_LIB})

then executor_runner wouldn't properly depend on libflatccrt.a. A parallel build could cause that lib to be coincidentally built earlier with -j16, "fixing" the problem, while -j2 would be less likely to do so.

Could you try printing the value of FLATCCRT_LIB from the top-level CMakeLists.txt to see if it's empty?

dbort · 2024-12-07T01:36:39Z

Though theoretically executor_runner shouldn't even need to know about libflatccrt: it should inherit the dep from the PUBLIC section of

executorch/devtools/CMakeLists.txt

Lines 179 to 183 in 538bfaf

    
           target_link_libraries( 
        
             etdump 
        
             PUBLIC etdump_schema flatccrt 
        
             PRIVATE executorch 
        
           )

But in this case, you could try updating this PR to use flatccrt as the dep instead of using ${FLATCCRT_LIB}. Even if FLATCCRT_LIB were defined, I think it's actually wrong to use it as the dep name -- I believe that the target is always called flatccrt even if, in debug mode, the file that it generates is called libflatccrt_d.a.

benkli01 · 2024-12-12T12:16:59Z

Hi @dbort . I tried fixing the flatccrt dependency as suggested, but without any effect:

Replace ${FLATCCRT_LIB} with flatccrt
Remove flatccrt dependency completely and rely on inherited dependency from etdump

(I did this both in CMakeLists.txt and examples/qualcomm/executor_runner/CMakeLists.txt)

I did find a new workaround though, which should be more stable than just removing the --num_jobs 2:
Run the command to build the QNN SDK twice, first clean then without cleaning in file .ci/scripts/build-qnn-sdk.sh.

I feel like the flatccrt issue is not related to my change so I will open an issue for it. I can push the above workaround in a separate PR. I will be off from tomorrow until next year, but I really hope we can find a solution together to get this PR merged.

- Raise a CMake error if event tracing is enabled without the devtools - Re-factoring of the changes in the portable executor_runner - Minor fix in docs Change-Id: Ia50fef8172f678f9cbe2b33e2178780ff983f335 Signed-off-by: Benjamin Klimczak <benjamin.klimczak@arm.com>

benkli01

Thanks for the review! All issues be fixed now.

examples/portable/executor_runner/executor_runner.cpp

docs/source/tutorial-xnnpack-delegate-lowering.md

CMakeLists.txt

backends/xnnpack/CMakeLists.txt

Change-Id: I0ebb22636cdd64aea24bcee51cba05496ed78b1f

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 2, 2024

pytorch-bot bot added the partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm label Sep 2, 2024

benkli01 force-pushed the add-profiling-to-xnn-executor-runner-2 branch from 3288eda to b09d09e Compare September 23, 2024 09:48

benkli01 added 5 commits October 1, 2024 16:33

Merge branch 'pytorch:main' into add-profiling-to-xnn-executor-runner-2

cbcbbe7

Merge branch 'pytorch:main' into add-profiling-to-xnn-executor-runner-2

4db02c9

Merge branch 'pytorch:main' into add-profiling-to-xnn-executor-runner-2

bf27add

Merge branch 'pytorch:main' into add-profiling-to-xnn-executor-runner-2

a2c254c

Merge branch 'pytorch:main' into add-profiling-to-xnn-executor-runner-2

3f23fae

benkli01 added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Nov 25, 2024

benkli01 mentioned this pull request Nov 29, 2024

Add event tracing and ETDumps to executor_runner #4502

Merged

Merge branch 'pytorch:main' into add-profiling-to-xnn-executor-runner-2

e692a76

dbort requested changes Dec 5, 2024

View reviewed changes

Merge branch 'pytorch:main' into add-profiling-to-xnn-executor-runner-2

1d9d0c0

benkli01 mentioned this pull request Dec 12, 2024

Missing libflatccrt.a when building executor_runner #7300

Closed

Merge branch 'pytorch:main' into add-profiling-to-xnn-executor-runner-2

f3493e4

benkli01 mentioned this pull request Jan 9, 2025

Work around flatccrt issue in CI script #7570

Merged

benkli01 added 2 commits January 10, 2025 14:08

Merge branch 'pytorch:main' into add-profiling-to-xnn-executor-runner-2

da448ce

benkli01 commented Jan 10, 2025

View reviewed changes

benkli01 added 2 commits January 15, 2025 09:36

Merge branch 'pytorch:main' into add-profiling-to-xnn-executor-runner-2

b36d5b4

Small fix for use of flag FLAGS_etdump

fde5862

Change-Id: I0ebb22636cdd64aea24bcee51cba05496ed78b1f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add event tracing and ETDumps to executor_runner #5027

Add event tracing and ETDumps to executor_runner #5027

benkli01 commented Sep 2, 2024

pytorch-bot bot commented Sep 2, 2024 •

edited

Loading

benkli01 commented Sep 2, 2024

benkli01 commented Sep 2, 2024

pytorch-bot bot commented Sep 2, 2024

benkli01 commented Sep 4, 2024

facebook-github-bot commented Sep 10, 2024

facebook-github-bot commented Sep 12, 2024

facebook-github-bot commented Sep 17, 2024

benkli01 commented Sep 23, 2024

digantdesai commented Sep 23, 2024

benkli01 commented Sep 23, 2024

facebook-github-bot commented Sep 24, 2024

digantdesai commented Sep 24, 2024

digantdesai commented Sep 30, 2024

benkli01 commented Sep 30, 2024

freddan80 commented Nov 29, 2024

cccclai commented Nov 30, 2024

benkli01 commented Dec 2, 2024

cccclai commented Dec 4, 2024

dbort left a comment

dbort Dec 5, 2024

benkli01 Jan 10, 2025

dbort commented Dec 5, 2024

benkli01 commented Dec 5, 2024

dbort commented Dec 7, 2024

dbort commented Dec 7, 2024 •

edited

Loading

benkli01 commented Dec 12, 2024

benkli01 left a comment

Add event tracing and ETDumps to executor_runner #5027

Are you sure you want to change the base?

Add event tracing and ETDumps to executor_runner #5027

Conversation

benkli01 commented Sep 2, 2024

pytorch-bot bot commented Sep 2, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5027

❌ 31 New Failures

benkli01 commented Sep 2, 2024

benkli01 commented Sep 2, 2024

pytorch-bot bot commented Sep 2, 2024

benkli01 commented Sep 4, 2024

facebook-github-bot commented Sep 10, 2024

facebook-github-bot commented Sep 12, 2024

facebook-github-bot commented Sep 17, 2024

benkli01 commented Sep 23, 2024

digantdesai commented Sep 23, 2024

benkli01 commented Sep 23, 2024

facebook-github-bot commented Sep 24, 2024

digantdesai commented Sep 24, 2024

digantdesai commented Sep 30, 2024

benkli01 commented Sep 30, 2024

freddan80 commented Nov 29, 2024

cccclai commented Nov 30, 2024

benkli01 commented Dec 2, 2024

cccclai commented Dec 4, 2024

dbort left a comment

Choose a reason for hiding this comment

dbort Dec 5, 2024

Choose a reason for hiding this comment

benkli01 Jan 10, 2025

Choose a reason for hiding this comment

dbort commented Dec 5, 2024

benkli01 commented Dec 5, 2024

dbort commented Dec 7, 2024

dbort commented Dec 7, 2024 • edited Loading

benkli01 commented Dec 12, 2024

benkli01 left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Sep 2, 2024 •

edited

Loading

dbort commented Dec 7, 2024 •

edited

Loading