Metavariables are not excluded in feature selection methods #7009

lareooreal · 2025-01-30T13:21:18Z

A longstanding issue is that metavariables are not excluded from methods. For example, in "find informative projections" for scatter plots, they appear as suggestions. Also, in feature suggestions, the metas are included. If there are many, the automatic feature selection breaks down. This is a nuisance, as metas often contain the solution to a classification problem. "Find informative mosaics" has the same issue, as does the violin plot where ordering by relevance also includes metas. Tree prediction does ignore them, though.

I am currently using version 338 on a Mac, and this error is present in the PC version as well.

This issue has existed in every version of Orange that I can recall.

Best larerooreal

markotoplak · 2025-01-30T13:57:00Z

Now visualization scoring (VizRank) uses all features that a user can choose manually.

But I do agree: in some cases you'll there can be lots of combinations that we are surely not interested in, and having some optional filtering would be nice.

I remember the development team discussing this long ago. I am not quite sure, but I think we used to have the opposite problem: metas were ignored even when we wanted to have them considered.

lareooreal · 2025-01-30T15:57:11Z

Selecting metadata for visualization is useful, but having them permuted in the model suggestions is not good! Metadata should remain as metadata, serving solely for interpretation purposes. This includes decorating plots with colors, labels, and annotations. Relevant predictors should be included in the X-block. It might be worth considering placing variables either in the X-block or in the metadata bucket. This approach would allow any variable to be used for modeling or proposing features, or for decorative purposes. Am Do., 30. Jan. 2025 um 14:57 Uhr schrieb Marko Toplak < ***@***.***>:

…

Now visualization scoring (VizRank) uses all features that a user can choose manually. But I do agree: in some cases you'll there can be lots of combinations that we are surely not interested in, and having some optional filtering would be nice. I remember the development team discussing this long ago. I am not quite sure, but I think we used to have the opposite problem: metas were ignored even when we wanted to have them considered. — Reply to this email directly, view it on GitHub <#7009 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AM5BQGHFIINBU6H3DWTFXLL2NIVUDAVCNFSM6AAAAABWFH4KGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRUGU4DQNBUGY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ajdapretnar · 2025-02-19T08:38:06Z

Coming from digital humanities, we often don't do predictions but data discovery, text mining and so on. There are many interesting variables that are not used for text mining (for example), but we still wish to explore them. Imagine having a corpus of parliamentary speeches, where you'd wish to also explore the structure of the parliament (through the metadata on the speakers). Of course, this could be achieved by shuffling data back and forth with Select Columns, but it leads to more work just like the above issue.

I suggest having a checkbox in VizRank that excludes metas. On by default.

janezd · 2025-02-19T10:01:26Z

I agree with all sides. @markotoplak remembers correctly. This is indeed a "longstanding issue" in the sense that it used to be different (many) years ago. There are arguments for both sides; at this very moment I agree that visualizations are basically models, so meta variables should appear only as decoration. But I know I'll change my mind again. Without having a convincing argument (or a solution that satisfies both) we shouldn't switch back and forth every few years.

I would prefer not to add any check boxes. A we open the floodgates, their number will rise quickly.

Furthermore, a checkbox like this would be error-prone because the implementation would be too difficult: the state of the checkbox changes the content of the model and thus affects context matching. On the other hand, the check box state is a part of the context. Even if we can decide how to resolve this in principle, I predict we'll never resolve all the edge cases in the code.

lareooreal added the bug report Bug is reported by user, not yet confirmed by the core team label Jan 30, 2025

markotoplak added the needs discussion Core developers need to discuss the issue label Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metavariables are not excluded in feature selection methods #7009

Metavariables are not excluded in feature selection methods #7009

lareooreal commented Jan 30, 2025 •

edited

Loading

markotoplak commented Jan 30, 2025

lareooreal commented Jan 30, 2025 via email

ajdapretnar commented Feb 19, 2025

janezd commented Feb 19, 2025

Metavariables are not excluded in feature selection methods #7009

Metavariables are not excluded in feature selection methods #7009

Comments

lareooreal commented Jan 30, 2025 • edited Loading

markotoplak commented Jan 30, 2025

lareooreal commented Jan 30, 2025 via email

ajdapretnar commented Feb 19, 2025

janezd commented Feb 19, 2025

lareooreal commented Jan 30, 2025 •

edited

Loading