Update ptxla training #9864

entrpn · 2024-11-04T22:02:05Z

Updates TPU benchmark numbers.
Updates the ptxla training example code.
Adds flash attention to ptxla code running on TPUs.

@sayakpaul can you please review. This new PR supersedes the other one I had opened a while back, which I just closed. Thank you.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sayakpaul · 2024-11-05T10:53:42Z

Cc: @yiyixuxu could you review the changes made to attention_processor.py?

yiyixuxu · 2024-11-05T20:52:29Z

@entrpn can you use a custom attention instead? (without updating our default attention processor)

zpcore · 2024-11-05T21:02:02Z

@entrpn can you use a custom attention instead? (without updating our default attention processor)

Hi @yiyixuxu , we wrapped the flash attention kernel call under condition if XLA_AVAILABLE. This shouldn't touch the default attention processor behavior. Can you give more details about use a custom attention? Thanks

yiyixuxu · 2024-11-05T21:11:33Z

I'm just wondering if it makes sense for Flash Attention to have its attention processor since this one is meant for SDPA

cc @DN6 here too

entrpn · 2024-11-05T22:11:55Z

@yiyixuxu this makes sense.

@zpcore do you think you can implement it?

zpcore · 2024-11-05T22:14:03Z

@yiyixuxu this makes sense.

@zpcore do you think you can implement it?

Yes, I can follow up with the code change.

zpcore · 2024-11-05T22:56:39Z

Hi @yiyixuxu , what about we create another AttnProcess with flash attention in parallel with AttnProcessor2_0? My concern is that majority of the code will be the same as AttnProcessor2_0.

yiyixuxu · 2024-11-06T00:24:20Z

@zpcore
that should not be a problem. a lot of our attention processors share majority of same code, e.g. https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py#L732 and https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py#L2443

this way user can explicitly set to use flash attention if they want to

miladm · 2024-11-06T01:16:09Z

@yiyixuxu - to better understand, can you please help me understand why wrapping the flash attention kernel call under condition if XLA_AVAILABLE causes a trouble? Do you want this functionality to be more generalized?

yiyixuxu · 2024-11-06T02:26:18Z

is it not possible that XLA_AVAILABLE but the user does not want to use flash attention?
our attention processors are designed to be very easy to switch & each one corresponding to a very specific method -> could be xformer, SDPA, or even like special method like fused has its own processor

jfacevedo-google added 5 commits November 1, 2024 00:25

update ptxla example

f04ee1d

update ptxla example based on Pei's comments.

96af06e

add print loss cli argument. Run make style and quality.

6234a37

make measure_start_step an argument.

b516134

use PORT variable across the script.

8c47f35

Merge branch 'main' into update_ptxla_training

10b6ba1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ptxla training #9864

Update ptxla training #9864

entrpn commented Nov 4, 2024

sayakpaul commented Nov 5, 2024

yiyixuxu commented Nov 5, 2024

zpcore commented Nov 5, 2024

yiyixuxu commented Nov 5, 2024

entrpn commented Nov 5, 2024

zpcore commented Nov 5, 2024

zpcore commented Nov 5, 2024

yiyixuxu commented Nov 6, 2024

miladm commented Nov 6, 2024

yiyixuxu commented Nov 6, 2024 •

edited

Loading

Update ptxla training #9864

Are you sure you want to change the base?

Update ptxla training #9864

Conversation

entrpn commented Nov 4, 2024

Before submitting

Who can review?

sayakpaul commented Nov 5, 2024

yiyixuxu commented Nov 5, 2024

zpcore commented Nov 5, 2024

yiyixuxu commented Nov 5, 2024

entrpn commented Nov 5, 2024

zpcore commented Nov 5, 2024

zpcore commented Nov 5, 2024

yiyixuxu commented Nov 6, 2024

miladm commented Nov 6, 2024

yiyixuxu commented Nov 6, 2024 • edited Loading

yiyixuxu commented Nov 6, 2024 •

edited

Loading