-
-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fiber support #106
Comments
I don't use fibers. You will need to explain to me how fibers operate and what is their relation with threads. I'd like to know exactly what we're dealing with here, before deciding how to process further. |
Thanks for the reply. Fibers are like threads in that they have their own stack, registers, etc. But different in that the OS does not schedule them. Instead, they're more like co-routines (lua, python, etc.) where a thread must explicitly switch to them, and explicitly switch away from them. When a task is queued to our task system, a Fiber is allocated for it. Once the task is finished, the Fiber is then returned to a pool waiting for a new task. The task threads basically do the following:
Currently with Tracy, if a Thread needs to do 2.a. we end all of the current zones but remember them for later. When a Thread resumes a waiting Fiber, we begin again all of the zones that we remembered. This makes it look like all of the functions finished, and were called again later, and all of that time spent waiting isn't represented visually (which isn't ideal). We'd like to be able to treat Fibers like Threads so that if a Thread isn't running a Fiber we can keep all of the zones pending and show that the Fiber is waiting. |
Thanks for the explanation, now I get the gist of it. I don't think fibers can be aliased to threads, due to the following reasons:
What would be needed instead is:
This should be enough for proper support. Can you provide an example application, which would replicate your task system, with some mock jobs that represent your usage patterns? |
I agree with what you state is needed. Since fibers can be created or destroyed at any point, and are basically just memory until switched to, point no. 1 isn't necessarily needed. Our task system already has a event notification for starting/stopping a fiber on a thread, so your point no. 2 would be ideal. On Windows, Fibers are supported as part of the OS (i.e. CreateFiber, etc.), but on Linux we use boost fibers. I'll see if i can produce a simple sample app at some point. |
There has been some progress on this. The interface and needed changes will be minimal, e.g.: void SwitchToFiber(Fiber *const fiber) {
TracyFiberStart(fiber->m_name);
boost_context::jump_fcontext(&m_context, fiber->m_context, fiber->m_arg);
TracyFiberEnd;
} With the However, there were also unforseen consequences for these changes. For fiber tracking to work, zone collection within fibers will have to be serialized. I have to think how to make it work efficiently. |
Awesome, thanks for the update. Let me know when you have a release that I can test with. I still haven't had any time to make a sample application. Minor point: On Windows we're using |
This would also be very useful for supporting Haskell threads! I think that the proposed API of
would work very well. |
Just wanted to ask what the current state of this issue is, since I accidentally opened a duplicate before finding this one. The only thing I'd add is that "execution context" or something like that is probably a better terminology for the API functions than "fibers", because this can be used for not just fibers, but also coroutines, job systems and schedulers for example. |
This is pretty much blocked by the lack of a reliable job-scheduler-type-of-thing. The examples I was provided were of some help, but ultimately I got too tired having to deal with CMake bullshittery, or having to figure out the hackeries involved in how the production libraries do work. So, I need something simple that I can reason about. I need to be able to know when the fiber (execution context) execution is started and when it is stopped (by the fiber controller library). RichieSams/FiberTaskingLib#126 seems to provide some kind of a support for what I need, but again, half of that PR is some unrelated variable type changes, which makes me not want to take a look at what this does. |
Wait so we are waiting on an example applicationt? Just some application that uses fibers/coroutines and creates a bit of load, that you can instrument with Tracy, so that you can test the feature and iron out the bugs? If that's the problem, I should be able to just throw together some dummy application in a cpp file over the weekend. Or in the next couple hours tbh. |
Basically, yes. I would prefer something that doesn't necessarily use fibers, but rather simulates their usage. The simpler the better. (Previously I have encountered races, which were hard to trigger and debug. At the same time I had some synchro issues with Vulkan to figure out at work. Things added up.) |
@wolfpld here you go. I used real fibers though, since I don't really know how one would simulate their behavior without the real thing. They swap the registers and stack and so on, so there is no way to do this in code trivially. This went surprisingly well, I have never used fibers in windows before, but the api is actually pretty nice (good job windows!) :D. The application has some workers that pretend to do networking. Let me know if you need anything. |
Thanks, it seems to be simple enough. I'll see what I can do with this. In the meantime, can you prepare a multithreaded version, with concurrent execution of tasks? |
I commited the multi-threaded version where each thread tries to take fibers from a global pool to execute. I actually messed up the terminology in the first version: Fibers are now jobs, and threads are workers. |
Btw, I'd also be willing to test the feature as soon as you have a working version up and running. I have an application at hand that I want to profile that uses co-routines. |
I've been thinking about this for a while recently, as there are other general programming patterns that tracy don't support well, and fibers just happen to be one of them. Pipelined processing of data is perhaps one that interest me the most – here zones can span multiple threads as well, though for somewhat more straightforward reasons. I think it may be worth to think about decoupling zone data from the thread as an execution context. Perhaps just giving the user ability to specify to tracy what the user thinks the "thread" or in this case a "task" is for each of the zone / message / etc could be a viable solution that does not require much effort to implement? As far as Fibers are concerned in particular each fiber could become a "thread" in current tracy's visualization (and the user could store their identifier that they share between calls to tracy as a fiber-local variable or something). Sampling profiler would still have to work on a per-thread basis, however, but I don't think that's avoidable in the general case. Here's an example visualization that I made which demonstrates what the visualization could look like. I used colour coding for the threads, but I don't think its strictly necessary: |
This is exactly how fiber (task/job/parallel whatever) support will be implemented. And it requires some effort :)
Here's what I have in mind: |
That is so awesome, I'm glad we got things going again on this! |
I was also thinking: Maybe there should be a job-context-category? If my application uses e.g. a job system with fibers, but I also want to track some different kind of pipelined data processing like @nagisa suggested, then I'd use the same API setting the job-context for the zones. But I have essentially two sets of zones with job-contexts. It would be nice to be able to name one of the groups "fibers" and the other "pipelined data" or something like that, and mark the zones as belonging to one of those groups. And then having the ability to toggle their visibility like threads. That is a theoretical thing ATM, I don't have a specific use case in mind, just throwing it out there. Not sure if the need for this would actually come up in practice very often. |
This looks awesome, can't wait to get to try it out! |
Oh, this would be lovely. I was just looking into using Tracy with fiber/jobs today. |
Traditionally, each thread in Tracy writes its events to a separate queue that doesn't need to be synchronized or locked in the process. The per-thread async queues are then sent to the server in a random order. This works great as long as events in question are relevant only to a single thread. Things become problematic when there are interactions between threads. Sometimes the solution is simple, for example when multiple threads produce values on the same plot. All that's needed here to have a coherent view is to sort the plot values by their timestamp. Old versions of Tracy did the same with lock events. There was much work put into reconstructing the lock timelime, when eventually some past lock events did arrive from a forgotten thread. While it seemed to be mostly working, it never really could. With lock events you need to know the exact ordering, and any two (or more) events can have the same timestamp, which makes it impossible to know which one truly happened first. The advent of multicore not only makes this more apparent due to a larger number of threads running at the same time, but also by making the timestamp readings more granular, due to difficulties at the hardware level. Providing a consistent clock across the system is not an easy task when you have many cores, many on-chip dies, or even multiple CPU sockets. The software solution here is to serialize all lock events, which is not ideal, as now you have a lock, and you have contention, and things are not running as smoothly as before. But you can't do this in any other way. The same is true for fibers, coroutines or any other such technique. Zone events, previously isolated to a single thread, can now hop from one to another and you need to know the exact order across all threads. Hence the need for serialization of even more events. In practice you won't be able to say "this function is only used by fibers, so serialize this, and not the other parts of the code". Your assumption would break sooner or later and you would suddenly be in a very sad place. So, all zones have to be serialized, even the ones that are isolated within a single thread. This is why fiber support will need to be explicitly enabled by adding a define. It may be interesting to know how much impact this serialization may have on execution times. Well, it of course depends on many factors, which basically boil down to how much queue contention is there at any given time. The raytracer example is an extremely pessimistic case, because you would never be measuring 30 threads generating 150 million zones in total, in a time span of a one second. But that's the application I have data for. With the async per-thread queues the application needs 1.7 second to execute (the one second figure above is true, as it excludes the initialization and shutdown routines) and transfers 731 MB of data to the server. Below you can find a histogram of a short-lived function which is executed 50 million times. When zones are stored in a synchronized queue, the application finishes in one minute and 41 seconds, and needs to send a bit over 2 gigabytes of data. This increase in data size is due to much more frequent thread context switching (caused by interleaving of events), which requires sending context switch notification, and which also invalidates the thread time delta, forcing transfer of a full timestamp, instead of the nicely compressible mostly-zeros time difference from the previous event. The histogram for the same function as above looks dramatically different. All the extra time is of course spent waiting for the lock to become available, as you would expect in case of high contention. I will repeat that this is not your typical use case. |
You can now test the serialization of events by checking out the current master branch and adding the The affected areas are:
Each of the available APIs (C++, C, Lua) should be supported. |
Awesome, thanks @wolfpld. Is there a new API to call to notify Tracy that a thread is switching contexts? |
|
The reason why I asked is because such a function could be used to switch which |
I guess I have not considered such approach, because it would require some hackery on the concurrentqueue side. It certainly makes sense to do things in such a way in the end, but to minimize the amount of moving parts which can break the serialization approach will be used for the time being. Right now the path is: serialize zones, and then implement fiber-to-thread mapping. These are two separate tasks which you can reason about without needing to think about the impact of the other one. With the concurrentqueue approach you describe, it would only make sense to implement everything in one go. |
There is now a minimal implementation of fiber data collection on master branch. To enable, define void schedule_job(Job_Data* job_data) {
TracyFiberEnter( job_data->name );
SwitchToFiber( job_data->fiber );
TracyFiberLeave;
}
void job_yield(Job_Data* job_data) {
SwitchToFiber(worker_data->base_fiber);
} Make sure that zones are able to complete, e.g. by adding a separate scope: void job_main(Job_Data* job_data) {
{
ZoneScoped;
job_data->has_job_started = true;
// ...
job_data->is_job_done = true;
}
job_yield(job_data);
} Should there be only the function scope, the zone destructor would never be called, as control would never return to |
The requirement to go job -> scheduler -> job has been relaxed in 4c77413. |
This is huge! I am very happy with how well it works. |
Being able to filter the message stream on fibers would be nice, it seems the UI is there, but it doesn't work properly. |
A different approach for internal processing was applied, which should fix issues with messages, or crashes as reported on Discord by @Xenonic. Fibers should now be considered ready to use. Fiber activity regions are now displayed using context switch data, as was previously described at #106 (comment). This activity data is not integrated with the running thread context switch data. Such functionality may be added later, but it is unlikely to work during a live capture, and will require a save-load of the trace. Worker threads won't be automatically indicating when they are running in the fiber context. This can be easily added on the client, just as another zone. |
Closing, as this is now implemented. Performance improvements will arrive at a later time and won't be tracked with this ticket. |
Our task system uses Fibers. Since Tracy appears to get its own thread information this makes profiling with fibers problematic. Ideally we like to show a fiber as a thread. In order for this to work, we would need a way to override the current Thread ID that Tracy uses.
A few ideas:
TracyCZone
functions that take a "context identifier" that could either be a thread ID or a fiber IDTracyCSetThreadIDOverride
function that overrides the current thread's ID with a given fiber ID until called again with 0.The text was updated successfully, but these errors were encountered: