You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we have no serious stress test for our record IO in place - the examples all write structs of standard data types, which are really simple to deal with.
This changes potentially with the merge of #103, which adds a parameter struct to the records. That is absolutely necessary for keeping track of what actually went into the benchmarks, but it implicitly sets us up for a serious problem: How do we deal with results for benchmarks that take non-standard data types like models, functions, algorithms etc.?
Serializing a record coming out of a benchmark run including accuracy can potentially be really challenging, since it is unclear how to represent MyModel in a written record.
There are multiple ways around this: First there is the option of requiring the user to make their records conform, but this has the downside of extra work for them, and can break reproducibility if the chosen representation lacks reproducibility. Adding serializer hooks for custom data types is an option, but it's convoluted and a lot of work.
Then there is the option of making the benchmarks take a unique identifier for the model instead, which is a standard data type (e.g. a hash, a remote URI etc.), and loading the artifact just in time for the user to access. This should mean easier reading/writing of records, but requires more code for the benchmark setup. It also requires us to change our story for the model artifact benchmarks, where we would need to come up with a way to efficiently instantiate models based on some information.
We should be able to use a setup task for the benchmarks to accomplish this. A cache lookup for the proper artifact, which is loaded before the start of all benchmarks seems like a good example.
The text was updated successfully, but these errors were encountered:
Summary of developments that have since happened to address this issue:
Regarding serialization of parameters, a nnbench.transform submodule has been checked in in Add transform submodule, parameter compression transform #124 that contains a barebones parameter serialization transform. If it is not sufficient for advanced uses or custom types, users can follow the spirit of the transform class to implement their own.
Currently, we have no serious stress test for our record IO in place - the examples all write structs of standard data types, which are really simple to deal with.
This changes potentially with the merge of #103, which adds a parameter struct to the records. That is absolutely necessary for keeping track of what actually went into the benchmarks, but it implicitly sets us up for a serious problem: How do we deal with results for benchmarks that take non-standard data types like models, functions, algorithms etc.?
Consider the following example:
Serializing a record coming out of a benchmark run including
accuracy
can potentially be really challenging, since it is unclear how to representMyModel
in a written record.There are multiple ways around this: First there is the option of requiring the user to make their records conform, but this has the downside of extra work for them, and can break reproducibility if the chosen representation lacks reproducibility. Adding serializer hooks for custom data types is an option, but it's convoluted and a lot of work.
Then there is the option of making the benchmarks take a unique identifier for the model instead, which is a standard data type (e.g. a hash, a remote URI etc.), and loading the artifact just in time for the user to access. This should mean easier reading/writing of records, but requires more code for the benchmark setup. It also requires us to change our story for the model artifact benchmarks, where we would need to come up with a way to efficiently instantiate models based on some information.
We should be able to use a setup task for the benchmarks to accomplish this. A cache lookup for the proper artifact, which is loaded before the start of all benchmarks seems like a good example.
The text was updated successfully, but these errors were encountered: