-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Timestamping #176
Conversation
Interesting - thanks for the PR! Can you show an example of how this works? Looking at the relationships between commandbuffer and timestamp pool, it seems like it's more similar to that of the Tensor/Op or Algo/Op than to something that would be specific to a sequence. Is there a reason why this isn't explored instead as something similar to an Algorithm, or Tensor, where there can be a Timestamp object, and then we could have an OpTimestamp type that basically would be able to call the Timestamp's record. That way also you would be able to set the timestamps explicitly as: std::shared_ptr<kp::OpTimestamp> opTimestamp = { new kp::OpTimestamp(mgr.timestamp(...) };
mgr.sequence()
->record(opTimestamp)
->record<kp::OpTensorSyncDevice>(...)
->record(opTimestamp)
->record<kp::AlgoDispatch>(...)
->record<opTimestamp)
->record<kp::OpTensorSyncLocal>(...)
->record(opTimestamp)
->eval()
std::vector<std::uint64_t> timestamps = opTimestamp->timestamps() It would be good to get a better insight on timestamps, how are you currently using them? |
A simple example: seq = mgr.sequence(nrOfTimestamps=100)
(seq.record(kp.OpTensorSyncDevice([a,b]))
.record(kp.OpAlgoDispatch(algo0))
.record(kp.OpAlgoDispatch(algo1))
.record(kp.OpAlgoDispatch(algo2))
.record(kp.OpTensorSyncLocal([c]))
.eval())
print( np.diff(seq.get_timestamps()) )
I've just started using this, so I cannot tell much about it yet. Previously I created a sequence for each operation and measured the time for it to evaluate as you suggested in #110 but this is slow and requires writing additional profiling code, so I was looking for alternatives and found that Vulkan has this built-in. |
Ok interesting, yes I see what you mean in regards to potentially requiring more work for the user. Is there a reason why we'd want to have a max number of timestamps? Would it not be possible to just allow the timestamps to be set in between each operation when enabled and then cleared every time eval is called without limits? Mainly as the re-record would also be cleared as well |
A maximum number is required by Vulkan when creating a QueryPool. |
Edit: Did a first pass, looks like this initial approach in the PR is a good way to go and can be explored from there Oh I see what you mean now by the point above in regards to not having the exact number of operations upfront... I agree that re-record could help but yeah it would require adding manual timestamp. It feels a bit awkward to expose the exact number of timestamps that needs to be set by the user - perhaps what could be set in this case is a function to trigger a re-record, this way the queue creation would be set up with the right size and it would only live for as long as the recorded commands request this explicitly. It could then be used as: mgr.sequence()
->record(...)
->record(...)
->record(...)
->record(...)
->rerecord(true) // creates queue and adds timestamps
->eval()
->record(...) // deletes queue and records new command buffer
->rerecord(false) // re-records but doesn't add timestamps
->rerecord(true) // re-records and adds timestamps
->eval() One thing to ask is whether the simple There's still a couple of things that are not clear:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had another look and it seems that the current approach does seem to make sense, I did an initial pass, would be good to also dive into some of the points:
- When calling record after eval is it still expected to use the same query pool? It seems like it makes more sense to have one pool specific for the recorded batch
- Would be good to also implement rerecord(uint32_t totalTimestamps) to allow for resizing
Overall looks like a great addition - probably good to add tests and then we can use the test to add an example in the advanced-examples.rst
documentation
It's not the exact number but the maximum number. The user can set it to a high number like one million if they are ok with allocating that additional memory.
I don't quite understand what you mean with that. (I believe) the query pool is simply a buffer where timestamps are written into. If I call What's the point of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - only added a minor comment but should be quite fast change, it would also be great if we can add a test to make sure the exptected timestamps can be retrieved. Here's also some thoughts on the points above:
It's not the exact number but the maximum number. The user can set it to a high number like one million if they are ok with allocating that additional memory.
I agree it makes sense to initially allow the user to provide a default value - I think the ability to allow the user to update it could also make sense, and could be another main value / purpose of the re-record command, I mention further thoughts below.
I don't quite understand what you mean with that. (I believe) the query pool is simply a buffer where timestamps are written into. If I call rerecord() the buffer stays untouched and will be simply reused on next eval().
I was thinking that it would be useful to be able to "clear" and "resize" the timestamp queue, this could be a useful purpose of re-record. By extending it to receive the rerecord(uint32_t totalTimestamps)
it can be possible to allow for either a re-creation of the pipeline with a different size, and/or a clearing of the current logged timestamps by recreating the pipeline with the same size.
What's the point of rerecord anyway? I can't see a use case. Personally, I would rather create a new sequence instead.
Tensors and algorithms can be rebuilt with different values / configurations, which means for a sequence to be "refreshed" it would need to be re-recorded. You can certainly go through all the record commands manually, but rerecord
provides a simpler way to trigger a "reload" which can take into account any changes on tensors. There is a test in TestSequence
which shows an example.
By the way, if you get a chance it would be good to get your overall thoughts on #177 as I'm adding support for multiple types on tensor (uint, int, float, double & bool).
Added test case but filtering it in Makefile because SwiftShader does not support timestamps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - I've just tested locally, I didn't get any failures when running all the tests, and also passes when only running the test. As you say swiftshader doesn't seem to support Timestamps - but there's limitation in other aspects like supporting double
so I think skipping it would be ok.
PR looks good, would be good to confirm whether the addition of the rerecord is something that we'd want as part of the PR or whether it's something that can be explored later on - it seems probably the latter makes more sense but may be worth creating an issue. Let me know and I can merge or hold accordingly.
I would merge it as it is. I will take a closer look at re-recording a bit later. |
Sounds good 👍 |
Added the option to latch timestamps after each operation with
vkCmdWriteTimestamp()
.Notes:
OpAlgoDispatch
, e.g. after the memory barriertimestamp_name="banana"
torecord()
for easier identification which operation took how much time.