-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata handling #1563
Comments
I think we should have a new function that replaces
Some details may be further influenced by @henrypinkard's various plans (especially around the sequence buffer). What to use instead of (Although the primary way we remove overhead might be to avoid copying the system state cache, the state cache itself can also be optimized for copying, e.g., by storing it as a pair of (sorted) |
I do understand that is the better long term solution. In the interim, we are running more and more into performance issues that seem to have their roots in the huge number of metadata. My proposal here seems relatively easy to implement, and allow projects that need performance to move forward. Maybe I am wrong and your proposal is also easy to implement soon. |
If we just want a short-term solution, it could be as simple as a flag in MMCoreJ that disables the addition of the system state cache data. The only downside is that such a flag will be global state, so there could be issues if one plugin disables it and another plugin expects it to not be disabled (for example; not sure if this is a real concern). To prevent this, we could make the call to disable device metadata apply only to the current acquisition (and ignored if no acquisition is running). Then application code can start an acquisition, disable metadata addition, optionally get the system state cache contents once, and then keep popping images with no metadata added. If another plugin starts an acquisition later, it will get metadata as expected. But one could also argue that it would be simpler to just make it a global setting and give users control (e.g., put it in Tools > Options). This might make more sense because the need for speed probably doesn't change from acquisition to acquisition on the same microscope. Maybe I prefer this just for its simplicity. |
I'm not sure that this is true. Pycromanager + NDTiff can save at multiple GB/s seemingly indefinitely on nvme drives (see here). This is with full system state cache in each image. While I'm sure there are situations where metadata can be a problem (e.g. tiny ROIs), its worth having more data on how problematic this actually is in practice before trying to solve it. Maybe a short term solution is not needed and this can be deferred to the larger remaking of the bufffers.
Yes, agreed. I think there a couple fields of "essential metadata" that should always be present with an image: pixel size, width, height. Will write some more about this in the buffer proposal
I think the JSON schema is helpful for being able to organize and group metadata. AcqEngJ makes use of recursive structure at present, and it seems like the might be other backwards compatibility issues if we tried to switch from a recursive schema to a a flat one (though maybe not within the core itself) |
Count me confused. I remember you saying during the LIghtSheetManager meeting that taggedImages can not go in a performant way into the MM Datastore because of the slowness in metadata handling. Reducing the metadata (significantly) should take care of that. What am I missing? |
mmcorej.json is relatively slow at serializing a JSON object. So in order to be performant that serializing needs to be threaded intelligently. Using Pycromanager to go directly from AcqEngJ --> NDTiff has this optimized and achieved the high performance. When going through studio, the json metadata of TaggedImage gets converted to a Metadata object and then back to JSON before going into NDTiff via the DataStore API. This involves serialization/deserialization somewhere along the way. I don't know that this is causing a performance hit, but it is introducing at least one extra serialization, so it could be. It should be fairly easy to test with fast HDs |
Then does it not make sense to have a way of getting images without the overload of metadata the we do not have any use for? Seems a cheap way to speed things up, and possibly could be a bottleneck for using the Micro-Manager Datastore and Viewer. Seems worth figuring out especially since it is easy to do. |
I thought we were talking about the performance of This is not about GB/s of data transfer or saving, because the MMCoreJ overhead we are talking about here does not scale with GB. It needs to be tested with small image sizes (e.g. 64x64) at high frame rate (≥ 1 kHz) and a reasonably large (and known, standardized) number of device properties (demo config might be on the small side compared to many real microscopes). Perhaps it would be good to measure what is actually the bottleneck(s) if the goal is to support a particular use case with particular downstream handling. We know that I do agree with @nicost that simply avoiding the per-frame copy of system state cache does seem like a valid and simple fix. It has the huge advantage of not requiring optimization of all downstream handling. The fact that downstream code is handed a huge This is especially true when you realize that these device property values are not temporally linked to the image frame with any accuracy: they are the cached values at the time of popping the image (and the cached values are not always valid, but that's a whole nother story). So any application that wants to monitor device property changes during a sequence acquisition could just as well monitor the system state cache separately from the image popping, at whatever desired interval. (Although then there is no longer an expedient place to save the values that are not tied to image frames. I suspect that was a motivation for artificially bundling device state with image frames.) |
Yes, absolutely. I've never doubted this. And all the reasons @marktsuchida mentions support not being especially attached to the current way things are done. Separately, the issue being discussed re LightSheetManager was how can we get a prototype plugin with performant file saving as soon as possible. Given the testing that has been done thus far, there doesn't seem to be any evidence that this change to metadata needs to happen first. |
Currently, each "tagged" image taken from MMCoreJ has all the metadata available in the SystemCache attached to it. In many cases, this is uninteresting, since none or only very few of these metadata change between images. Processing of these metadata can take up to 1 ms (data needed), potentially being the biggest hurdle to go to speeds higher than 1kfps.
The SystemCache is added in the funtion "CreateTaggedImage() in MMCoreJ.i. Would it be possible to take this code out, and provide a separate function to add the system cache to the image metadata? We can find all instances in our code where CreateTaggedImage is called, and take appropriate action (i.e. evaluate whether or not to add the system cache metadata). This change seems relatively low impact, and confined to the issue at hand. I guess this would mean that we need to bump the MMCoreJ version. Curious about your thoughts.
The text was updated successfully, but these errors were encountered: