-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dataset/pipeline columns to the table #1657
Comments
No changes were made to Dataset and Pipeline file/s.
On click showing info about that was changed in the Dataset file/s. Textual content is just for example.
On click showing that no changes were made in Pipeline file/s.
Icon conceptIf no changes were made, it means the situation is same as before. In other words it's equal.
Notes
|
@shcheklein Following our discussion I have designed a first draft of possible solution. Please let me know what do you think. |
@maxagin looks good, a few comments:
|
Would you like to move information from this task to one you mentioned?
We are crafting information design. The styles will be outlined at the later stages. I have used VS Code in browser styles simply because we have discussed VS Code.
The VS Code has very specific GUI requirements. It is why it looks very different from the Studio. However this can be discussed when we work on the styles.
At this stage I have not focused on the naming and textual content (as I have mentioned in the previous comment). In the coming iterations I will include all the ideas related to the modals content on a very basic level, so we will have something to start with for the further discussions.
I have in mind a few examples of how to handle this. We will talk about it later. Thank you for your comments @shcheklein |
I am working on it. I can see a few possible solutions. Will share all when ready |
@shcheklein here is an update. Please share your comments
Useful only if no BG, just lines. I Like this idea very much 🥊
Showing all dataset files
🥁 More elegant solution and my favoriteDataset and Pipeline modals have the same structure, meaning more user friendly. Modal info hierarchy:
Dataset and Pipeline modals structure |
Just to clarify - how is the favorite one different from the initial one? (I just can't get the difference, am I missing something?) I'm lost in all the other options a bit. I see the one |
Same table, but with the modal. In fact there are only 2 options, all the rest intended to explain the benefits. I do not understand why to show all Dataset files (say 10 or more) and not just one Dataset cell - I thought that we would like to show users that Dataset was changed, and after (optional) show the details - where exactly changes were made. Also if we display in the table all Dataset files, why don't we display all Source code files? This is why I have proposed only 2 cells (Dataset and Source code) |
Primarily because usually there will be less than 10 datasets, but more than 10 source code files.
we want to show that dataset(s) have changed, yes. How it's done - one column, multiple columns, special mark nearby an experiment name, etc - that's what we need to decide on. One column for example "hides" a lot of information. For example, in some case data file size is a good signal to see. To summarize - I would probably initially go with a simple approach - one column per item for everything (data, source code, etc). We can just decide to toggle off certain columns (e.g. source code). We can even show some message how many columns got hidden by default. Do we need that aggregated signal? It sounds good but can we make w/o allocating the whole column (it takes space)? |
So the requirements for the dataset amount is
The easiest way to see that changes were made is to show
From my comment above:
This is why I propose “Only 2 columns Dataset and Source Code” this way on click we can show all files and some signals about the changes in every file !! The space is unlimited (scroll, unfolding and more). Meaning we can cover all the cases, using just 2 columns.
Commented just above + At this point we can not know what information we want to pull. But we know for sure that the easiest way to see that changes were made is to show: DATASET+Changes Made indicator. |
Domain specific details matter here. They are no unlimited. And we can expect a certain distributions - e.g. in 80% of the project it's a single dataset and 5 source files.
then why not make it even step further and not combine metrics and params? let's reflect on what is the difference between all of those.
In Studio we show size + a yellow dot, or size number can colored as yellow. That's what you mean? Also, if you have sizes you can sort and filter by this field which also gives you a tool to navigate.
This sounds like an optimization and an additional (show all of this type columns additional element still). We can do this in one way or another, but the first step is still to show all the columns. wdyt?
I'm not sure I understand this. |
Yes they matter, but if we possibly can have more then 5 single dataset or source files for 20% of the users, it means a lot.
Good point, but I think that
But the main reason as I see it is that Dataset and Pipeline are 2 different entities. Also I have a feeling that while experimenting it could be useful to see 2 signals in the table. Do not you think that in our specific domain, users will be very familiar with the Pipeline term? I feel it better describes the nature of related files then Source Code, which can be any files in the project.
I am not sure that the size is the best signal in this situation,
Another argument:
The easiest way to see that changes were made is to show a cell with a symbol. If, for example, we will use data file size as a signal, to find difference in the table becomes a not easy task especially in a situation when in the table we display multiple DATASET files and PIPELINE files. And if we want to show a symbol with size, I do not see any benefits of showing size, as at this point we only alert the user about the fact that the file changes and editions information beside the symbol (size) make it harder to work with table. But from what I understand, there are a lot of things you can do inside the files. Today you do A, tomorrow B. The symbol can be good for describing any possible change. The file size is may be misleading |
@maxagin we are making too many assumptions (size is misleading or not, 5 datasets or less, etc, etc) too early. Actual life is more complicated, DVC is not opinionated tool. Thus simple general, customizable solutions are better than complex opinionated ones. In this case the simples is to show everything and let people decide what to do. I still consider introduction of these compound columns (or in some other way showing aggregated changes) as an optimization and convenience rather than solution. We need to have columns per dataset, per source file and ability to show different signals in them + highlight if there were changes. Everything else comes secondary to that. And probably better not taking the whole column for that (they are expensive). |
@shcheklein here are two examples. Please let me know if this is what you had in mind. |
@maxagin yes, right, something like this. Datasets and Source code will be changed to actual file path though.
But we can start here I think. I like the idea of being able to collapse columns of each type into a single "signal" in some way, but would do this as a next step. |
Please share example |
@shcheklein if this is something we want to prioritize over other "improvements" tickets? |
@maxagin we will be implementing this as per #1183 (comment), updates to the current design can go into the spec that you create for #1562. |
Needs to take into account #1536. |
#1830 will close #1183. Take this comment into account when starting with follow-up under this ticket. |
@shcheklein @maxagin Can you confirm/correct the above please? |
@wolmir I think this is good. Just a question, for how long will we display the yellow color (changes made)? How should I know what has been changed if I will change train.py again? Thanks you |
Thanks @maxagin. I believe there was an agreement to first just show that there was a change because the commit number of the dependency is different. |
Wolmir, that is what we agreed for the next step. |
I opened a draft PR here: #2029 I just have to add unit tests for the modifications, but it should all be working. |
@wolmir can we close this one as done? Thanks |
I believe so, thanks @maxagin ! |
Description: If in files some changes were made, we want to see a sort of alert in the table view. For now we will add only dataset/pipeline columns, but maybe after there will be more to add.
The text was updated successfully, but these errors were encountered: