Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predefined Filters - ability to group them and also to toggle on/off #555

Closed
proxseas opened this issue Aug 19, 2021 · 12 comments
Closed

Predefined Filters - ability to group them and also to toggle on/off #555

proxseas opened this issue Aug 19, 2021 · 12 comments

Comments

@proxseas
Copy link

proxseas commented Aug 19, 2021

I've been enjoying using Custom Filters and Predefined filters quite a bit, but what I find myself doing quite a bit is simply copying and pasting saved queries in the Custom Filters dialog. This is not optimal, but it is the most powerful way for me to use this plugin for now. However, I think it could be enhanced easily with some of the following changes (in an approximate order of how easy it would be implement them):

  1. The ability to display the "Custom Filter" dialog by default, when you just launch dtale (ideally configurable as a startup param).
  2. The ability to display the "Predefined Filter" dialog by default, when you just launch dtale (ideally configurable as a startup param).
  3. The ability to set a default value for predefined filters. Right now, I'm fairly certain that there it is required for the user to enter some value before a predefined filter takes effect.
  4. An 'enabled'/'disabled' default state for predefined filters. This could just be an "initial_state" value in the call to predefined_filters.set_filters(). Also, in the predefined filters dialog, this should be a checkbox next to each predefined filter. This would make it so easy to filter data. You could set up a bunch of very useful filters, but some of them you may want to use less often than others, so you would set their default state to 'disabled'.
  5. Grouping of predefined filters. This might be somewhat more tricky.
    Essentially, functionality 3-5 is what I achieve by copying & pasting saved queries.
  6. Coloring of rows based on predefined filters.
  7. Somewhat related - is it possible to apply a default sorting of data?
@aschonfeld
Copy link
Collaborator

@proxseas thanks so much for the suggestions. With the exception of #5 these all seem doable. For the grouping, do you mean you just want to assign a classifier to each filter and then display them in groups?

@proxseas
Copy link
Author

@aschonfeld What I had in mind for #5 was something more akin to compound parameters for a query. E.g. let's say your data is related to animals and you initially have filters A and B, where A is df[(df["Animal_Category"] == 'bird') and B is df[(df["Animal_Category"] == 'fish')]. With group-related functionality, you could potentially do something like this.
A ='bird'
B='fish'
my_categories=[A,B]
df[(df["Animal_Category"].isin(my_categories))
So you would essentially be grouping parameters and checking/unchecking one of them would add or remove it from the filter list in the 'isin()' call. Meanwhile, checking the checkbox for the overall group would enable or disable all of them under that grouping.
All that said, I haven't given a lot of thought on what the grouping should really look like. What I proposed here is the first thing that came to mind. I just think that being able to group queries in some way would be quite useful.

@aschonfeld
Copy link
Collaborator

@proxseas just wanted to give you an update on this. I've got the first 3 working right now locally

@proxseas
Copy link
Author

Awesome, thanks for the update.

@aschonfeld
Copy link
Collaborator

@proxseas as for the default sort (7) you could just add it to whatever piece of code loads your data. Otherwise, I can certainly add the ability to specify a sort when calling dtale.show passing in a list of tuples [('col1', 'ASC'), ('col2', 'DESC'), ...] and then using a.global config file it would probably have to be a comma/pipe delimited string: col1|ASC,col2|DESC,...

Now sure how valuable that is, but it could certainly be done.

Now as for your grouping filters (5) I'm not sure what the difference between wht you're asking for and what already exists for the multiselect filter. So for example you define this filter like this:

import dtale.predefined_filters as predefined_filters

predefined_filters.set_filters([
  {
      "name": "A in values and (B % 2) == 0",
      "column": "A",
      "description": "A is within a group of values and B mod 2 equals zero (is even)",
      "handler": lambda df, val: df[df["A"].isin(val) & (df["B"] % 2 == 0)],
      "input_type": "multiselect",
      "default": [3, 4]
  }
])

Ignore the default thats part of the changes i'm making for your 3rd request.

Then here's what it looks like in the app:
https://user-images.githubusercontent.com/11547371/131273674-60e136e0-ea05-42b7-8546-c54387f131f3.mov

Let me know if theres something else missing

@aschonfeld
Copy link
Collaborator

I'm not sure how valuable "coloring or rows" (6) will be since filters are concatenated together using and and not or. So if a row is displayed then that means it has to meet the criteria of all the active filters. All of your filters are displayed in the header bar.

If we allowed for or (which would be a whole new can of worms) then I could see this being valuable, but thats not the case at the moment.

aschonfeld added a commit that referenced this issue Aug 30, 2021
@proxseas
Copy link
Author

@aschonfeld That video is looking good. Do you think it would be possible persist some of the filter settings to a text/json file? For example, toggling enabled/disabled for a predefined filter. More importantly, I think it would be super neat if it was possible to edit/define some of the filters in an external text file (probably located in the current working directory) and then hit a button in the current dtale instance and then have the changes loaded. Initially, it makes more sense to set up a filter within a Jupyter Notebook (that's how I use pandas, generally), but once you have more of them, it might be nice to have that separation of concerns.

@proxseas as for the default sort (7) you could just add it to whatever piece of code loads your data. Otherwise, I can certainly add the ability to specify a sort when calling dtale.show passing in a list of tuples [('col1', 'ASC'), ('col2', 'DESC'), ...] and then using a.global config file it would probably have to be a comma/pipe delimited string: col1|ASC,col2|DESC,...

You're right - there is little point in adding this to dtale when it's so simple to do in pandas. That was a silly request.

@aschonfeld
Copy link
Collaborator

Haha, unfortunately I just finished adding the ability to specify a default sort (as described in my comment from last night). At least it will be done regardless of whether anyone uses it.

So the only reason why a static file will not work for predefined filters is because of the handler property which has to be a python lambda function. My best suggestion is that you just keep a running tally of your predefined filters in a separate python module which you can reference when starting up D-Tale. I could try doing something similar to the custom CLI loaders where you can set an environment variable which maps to a path which contains any python modules which follow a specific structure for CLI loaders and then it will automatically make them available for you.

So the idea is that you'd set some environment variable like export DTALE_FILTERS=/home/dtale_filters.py and then my code an try loading that module and looking of a specific variable (maybe named PREDEFINED_FILTERS) containing a list of filter definitions in your dtale_filters.py file. Then it would be loaded into D-Tale on startup. But you would still have to maintain that file manually.

Maybe this could be part of the next release...

@proxseas
Copy link
Author

Haha, unfortunately I just finished adding the ability to specify a default sort (as described in my comment from last night). At least it will be done regardless of whether anyone uses it.

No worries - it definitely won't hurt. I think sorting is directly related to the presentation of data, rather than its transformation, so it will fit in nicely with dtale's functionality.

So the idea is that you'd set some environment variable like export DTALE_FILTERS=/home/dtale_filters.py and then my code an try loading that module and looking of a specific variable (maybe named PREDEFINED_FILTERS) containing a list of filter definitions in your dtale_filters.py file. Then it would be loaded into D-Tale on startup. But you would still have to maintain that file manually.

Maybe this could be part of the next release...

Sounds a bit involving. Maybe I'll set aside time later on to try this approach :)

@proxseas
Copy link
Author

Also, I was wondering if you could advise on something relating to categories. In one of my Jupyter Notebooks, I have a column called "tags" and there I have a Python list that is converted to a string. E.g. one row's "tags" column could have the value "['burger','fast food','American']" (note that the whole thing is a string - it's no longer a list). Then, I would query for rows containing specific tags using queries such as categories.str.contains('fast food') and categories.str.contains('American'). This is silly, but it does the job until I figure out the proper way to set up this kind of functionality - whether in pandas or in dtale. But if you have some ideas on this, I'd love to hear about them.
Relatedly, when I was mentioning coloring based on predefined filters, I was thinking of some way of coloring a row (or perhaps just a cell) that meets certain criteria. E.g. the food is American vs Italian. Or 'American' + 'fast food'. But the coloring is probably less important than than being able to [less awkwardly] filter based on a combination of tags/attributes.

@aschonfeld
Copy link
Collaborator

aschonfeld commented Aug 31, 2021

Wow, yea I've definitely come across some instances of people using complex objects as values for cells in a pandas dataframe (for example lists). I'm not sure how well D-Tale would handle this, but I think it will at least render it in a readible fashion (I hope)

But anyways, here's a nice article detailing someone's struggles with list data in pandas dataframes. Personally I like their "Method 2" approach which will eventually convert all the list data into a bunch of boolean columns describing whether a row contains that value or not. That translates pretty well to D-Tale, I think. But even if it doesn't that is how we would have handled it back in data science days.

@aschonfeld
Copy link
Collaborator

added in v1.56.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants