Skip to content

Conversation

@zejun-chen
Copy link
Owner

@zejun-chen zejun-chen commented Jul 11, 2024

TODO:

  • fix build issue
  • verify correctness
  • review internally
  • rebase with kineto master
  • kick out new PR to Meta.

Discuss:

  • Is kineto needed to be self built successfully with PTI? For now, kineto with PTI must be built with torch.
  • Do we need to assert build on windows platform because PTI doesn't support windows?

@gujinghui
Copy link

Should we reserve a dedicated folder for all plugin, like libkineto/src/plugin/xpupti/ ?

@zejun-chen
Copy link
Owner Author

Should we reserve a dedicated folder for all plugin, like libkineto/src/plugin/xpupti/ ?

We can do, but the design of kineto build sys is to find the needed src file by function https://github.com/pytorch/kineto/blob/eeb4e9b44da82d09709c482dc2cde40f973cc0a1/libkineto/libkineto_defs.bzl#L12

@zejun-chen zejun-chen force-pushed the zejun/xpu_profiler_2 branch from d00d34b to e9e5a5e Compare July 12, 2024 12:53
@zejun-chen
Copy link
Owner Author

zejun-chen commented Jul 12, 2024

Commit message:
[XPU profiler] Upstream XPU profiler into the kineto.
XPU Profiler follows the CUDA profiler in the kineto and works for the Intel GPU platform.
The XPU Profiler is based on the Intel tracing tool XPUPTI. In this PR, the XPU Profiler
backbone code is upstreamed into the libkineto/src/plugin/xpupti folder as the plugin
of the kineto and built into the libkineto.a. A same build flag, named LIBKINETO_NOXPUPTI,
is defined to control the kineto build with XPU Profiler, which is same as CUDA and ROCm.
The XPU Profiler is registered in the init function: libkineto_init().

# Set LIBKINETO_NOXPUPTI to explicitly disable XPUPTI
# Otherwise, XPUPTI is disabled if not found
if(NOT LIBKINETO_NOXPUPTI)
if (NOT SYCL_INCLUDE_DIR OR NOT SYCL_LIBRARY)
Copy link

@gujinghui gujinghui Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we do not follow IPEX as below? You are following torch_xpu_ops?
find_package(IntelSYCL REQUIRED)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a question, not a request.

Copy link
Owner Author

@zejun-chen zejun-chen Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there will introduce one new cmake file FindIntelSYCL.cmake.
It seems a little bit heavy. We just want to find the sycl include and lib when self building libkineto without torch.
When building with torch, the sycl include and lib has been found in torch.


IF (NOT SYCL_INCLUDE_DIR OR NOT XPU_xpupti_LIBRARY)
set(LIBKINETO_NOXPUPTI ON CACHE BOOL "" FORCE)
endif()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move all of above lines to src/plugin/xpupti?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like the better solution is, make xpupti as a separate static .a ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.a is not a MUST option. You can move these lines to separate cmakefile first.

set(XPUPTI_INCLUDE_DIR ${SYCL_INCLUDE_DIR} PARENT_SCOPE)

# find xpupti sdk
find_package(Pti)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pti is a MUST component. Should we add the "required" flag here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@gujinghui
Copy link

Commit message: [XPU profiler] Upstream XPU profiler into the kineto. XPU Profiler follows the CUDA profiler in the kineto and works for the Intel GPU platform. The XPU Profiler is based on the Intel tracing tool XPUPTI. In this PR, the XPU Profiler backbone code is upstreamed into the libkineto/src/plugin/xpupti folder as the plugin of the kineto and built into the libkineto.a. A same build flag, named LIBKINETO_NOXPUPTI, is defined to control the kineto build with XPU Profiler, which is same as CUDA and ROCm. The XPU Profiler is registered in the init function: libkineto_init().

Introduce XPU profiler by following kineto plugin design

As XPU became a PyTorch built-in device, the profiler support is indispensable part of functionality completeness. In this PR, the XPU profiler is introduced by following kineto plugin design under libkineto/src/plugin/xpupti. The XPU profiler plugin is built on the foundation of intel PTI toolkit (https://github.com/intel/pti-gpu), and underlying SYCL runtime. The LIBKINETO_NOXPUPTI option is added to enable or disable the XPU profiler plugin during kineto build stage.

As XPU became a PyTorch built-in device, the profiler support
is indispensable part of functionality completeness. In this PR,
the XPU profiler is introduced by following kineto plugin design
under libkineto/src/plugin/xpupti. The XPU profiler plugin is
built on the foundation of intel PTI toolkit (https://github.com/intel/pti-gpu),
and underlying SYCL runtime. The LIBKINETO_NOXPUPTI option is
added to enable or disable the XPU profiler plugin during kineto
build stage.

Signed-off-by: Chen, Zejun <zejun.chen@intel.com>
@zejun-chen zejun-chen force-pushed the zejun/xpu_profiler_2 branch from f02c4f2 to a59b653 Compare July 16, 2024 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants