Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metavariables are not excluded in feature selection methods #7009

Open
lareooreal opened this issue Jan 30, 2025 · 4 comments
Open

Metavariables are not excluded in feature selection methods #7009

lareooreal opened this issue Jan 30, 2025 · 4 comments
Labels
bug report Bug is reported by user, not yet confirmed by the core team needs discussion Core developers need to discuss the issue

Comments

@lareooreal
Copy link

lareooreal commented Jan 30, 2025

A longstanding issue is that metavariables are not excluded from methods. For example, in "find informative projections" for scatter plots, they appear as suggestions. Also, in feature suggestions, the metas are included. If there are many, the automatic feature selection breaks down. This is a nuisance, as metas often contain the solution to a classification problem. "Find informative mosaics" has the same issue, as does the violin plot where ordering by relevance also includes metas. Tree prediction does ignore them, though.

I am currently using version 338 on a Mac, and this error is present in the PC version as well.

This issue has existed in every version of Orange that I can recall.

Best larerooreal

@lareooreal lareooreal added the bug report Bug is reported by user, not yet confirmed by the core team label Jan 30, 2025
@markotoplak
Copy link
Member

Now visualization scoring (VizRank) uses all features that a user can choose manually.

But I do agree: in some cases you'll there can be lots of combinations that we are surely not interested in, and having some optional filtering would be nice.

I remember the development team discussing this long ago. I am not quite sure, but I think we used to have the opposite problem: metas were ignored even when we wanted to have them considered.

@lareooreal
Copy link
Author

lareooreal commented Jan 30, 2025 via email

@markotoplak markotoplak added the needs discussion Core developers need to discuss the issue label Jan 31, 2025
@ajdapretnar
Copy link
Contributor

Coming from digital humanities, we often don't do predictions but data discovery, text mining and so on. There are many interesting variables that are not used for text mining (for example), but we still wish to explore them. Imagine having a corpus of parliamentary speeches, where you'd wish to also explore the structure of the parliament (through the metadata on the speakers). Of course, this could be achieved by shuffling data back and forth with Select Columns, but it leads to more work just like the above issue.

I suggest having a checkbox in VizRank that excludes metas. On by default.

@janezd
Copy link
Contributor

janezd commented Feb 19, 2025

I agree with all sides. @markotoplak remembers correctly. This is indeed a "longstanding issue" in the sense that it used to be different (many) years ago. There are arguments for both sides; at this very moment I agree that visualizations are basically models, so meta variables should appear only as decoration. But I know I'll change my mind again. Without having a convincing argument (or a solution that satisfies both) we shouldn't switch back and forth every few years.

I would prefer not to add any check boxes. A we open the floodgates, their number will rise quickly.

Furthermore, a checkbox like this would be error-prone because the implementation would be too difficult: the state of the checkbox changes the content of the model and thus affects context matching. On the other hand, the check box state is a part of the context. Even if we can decide how to resolve this in principle, I predict we'll never resolve all the edge cases in the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Bug is reported by user, not yet confirmed by the core team needs discussion Core developers need to discuss the issue
Projects
None yet
Development

No branches or pull requests

4 participants