-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support strings in metrics #7960
Comments
I have a couple more thoughts on this. If I commit a file to the It would be helpful to be able to diff files from different branches stored in I thought that |
You can already use git to version text-based metrics files by using Regarding diffing, DVC does not know anything about the type of file that it is tracking - everything is treated as arbitrary binary data, so we don't provide any kind of contextual diffing (which depends on handling specific file types). There are some existing feature requests regarding diff behavior (like #7657), but essentially you would need to implement something that wraps |
HI @pmrowla @shortcipher3 @daavoo @codito @tizoc I would like to join iterative Team and contribute to the development of the project please do let me know where I start with the open source contribution till the time I join the team. I am a python developer with 4 years of experience. |
Hi @shortcipher3 Could you provide an example of what is the current version doing and what is the requirements so that I can make the changes accordingly. |
I created an example repo here Essentially I have two branches with a metrics file for a tiny model:
and a large model:
When I run a diff I get the following:
I would love to get something more like:
That way I'm getting a nice table of results and I'm able to easily compare metrics that are on completely different scales (GB/MB and seconds/milliseconds) - it would be hard to read if I converted the GB and MB to bytes, I would be slowing down to count number of digits. I can also add in meaningful data to help the reader understand the difference. |
As for being able to do a local diff, a lot of state of the art research are producing a family of models rather than just a single model, I would love to have a metrics file for each model and be able to do a diff on these. An example is DINO v2 They actually have a table comparing the models on a few metrics one of which has a string for units. Some other models with multiple sizes are:
I would think we could generate some useful tables for understanding some of these parameters automatically, making it easier for the data scientist to make decisions. |
Hello @shortcipher3 and @daavoo , I had a look into this issue and I might have a suitable solution. I am new to the community, so I am not sure what is the best way to proceed. Should the issue be assigned to me before a pull request? Thanks. |
Hei @paulourbano ! feel free to open the P.R. |
dvc metrics represent scalar numbers
This is nice for finding the difference in a metric between two models, however a couple metrics I'm interested in would benefit from being made more human readable by adding units. Specifically:
There are other reasons beyond units to support strings, for example we use vertex.ai's training service, which can and does change without warning, so storing the date of the training would be useful. I would also be interested in having the model sha, so that given a model binary I can quickly verify which row corresponds to the model binary I have.
The text was updated successfully, but these errors were encountered: